Value-instance-connectivity computer-implemented database field of the invention

ABSTRACT

A computer-implemented database and method providing an efficient, ordered reduced space representation of multi-dimensional data. The data values for each attribute are stored in a manner that provides an advantage in, for example space usage and/or speed of access, such as in condensed form and/or sort order. Instances of each data value for an attribute are identified by instance elements, each of which is associated with one data value. Connectivity information is provided for each instance element that uniquely associates each instance element with a specific instance of a data value for another attribute.

CROSS-REFERENCE TO RELATED APPLICATIONS

This application is a continuation of application Ser. No. 10/629,812filed on Jul. 28, 2003, which is a division of application Ser. No.09/412,158 filed on Oct. 15, 1999, now issued as U.S. Pat. No.6,606,638, which is a continuation of application Ser. No. 09/112,078filed on Jul. 8, 1998, now issued as U.S. Pat. No. 6,009,432.

FIELD OF INVENTION

The present invention relates generally to computer-implementeddatabases and, in particular, to an efficient, ordered, reduced-spacerepresentation of multi-dimensional data.

BACKGROUND OF THE INVENTION

State of the art database management systems (DBMS's), like theunderlying data files out of which and on top of which they historicallygrew, continue to store and manipulate data in a manner that closelymirrors the users' view of the data. Users typically think of data as asequence of records (or “tuples”), each logically composed of a fixednumber of “fields” (or “attributes”) that contain specific content aboutthe entity described by that record. This view is naturally representedby a logical table (or “relation”) structure (referred to herein as a“record-based table”), such as a rectilinear grid, in which the rowsrepresent records and the columns represent fields.

The long-standing existence of record-based tables and theircorrespondence to a conventional user view, in the absence of generallyrecognized drawbacks, has led to their nearly universal acceptance asthe major underlying internal representation of databases. Yetrecord-based tables contain key structural weaknesses including highlevels of unorderedness and redundancy that have traditionally beenregarded as unavoidable. For example, such tables can be sorted orgrouped (i.e., the contiguous positioning of identical values) on atmost one criterion (based upon column values or some function of eithercolumn values or multiple column values). This limitation rendersessential database functions, such as querying and updating, on allcriteria other than this privileged one awkward and overlyresource-intensive.

The above deficiencies inhere in the fundamental properties of therecord-based table structure, in particular, the requirement that thepositioning of each field be made co-linear with all other fields in thesame record. This arbitrary positioning of fields in record-based tablestructures excludes all other arrangements. It thus obscures natural andexploitable latent data relationships that are revealed by more ordered,condensed and efficient data arrangements. Moreover, the inability ofrecord-based tables to effectively group or sort data leads to negativecharacteristics of state of the art DBMS's such as unorderedness,redundancy, cumbersomeness, algorithmic inefficiencies and performanceinstabilities.

Database research provides palliatives for these problems, but fails touncover and address their underlying cause (i.e., the reliance onrecord-based table structures). For example, the inability to representa natural, multi-dimensional grouping within the confines of arecord-based table structure has led to the creation of index-based datastructures. These supplementary structures are inherently and oftenmassively redundant, but they establish groupings and orderings thatcannot be directly represented using a conventional table. Index-basedstructures typically grow to be overly lengthy, convoluted and arecumbersome to maintain, optimize and especially update. Examples ofcommon indexes are b-trees, t-trees, star-indexes, and various bit maps.

Other supplementary structures developed in the prior art have differentdrawbacks. For example, hash tables can provide rapid querying ofindividual data items, but their lack of sort ordering render themunsuitable for range queries or for any other operation that requiresreturning data in a specific order.

The ability to maintain an ordered, non-redundant, multi-dimensionaldata set, using flexible sorting and/or grouping criteria, is extremelyuseful to database management. Sorted data makes rapid searching andupdating possible via, for example, binary search algorithms andinsertion sorts. Grouped data enables condensation that reduces spacerequirements and further increases the speed of, for example, searchingand updating.

A system of data storage in which most or all columns of a data tablecan be stored in grouped and/or sorted order is thus extremelydesirable. Previous studies have investigated “fully inverteddatabases,” which index each column through traditional methods,preserving all the inadequacies of records and indexes. Additionally,the bloated storage requirements necessary to accommodate completeindexing tend to make fully inverted databases impractical, especially,but not only, in main memory databases.

SUMMARY OF THE INVENTION

It is therefore an object of the present invention to provide a fully orpartially ordered (e.g., grouped and/or sorted) database without thedeficiencies characteristic of the prior art, as mentioned above.

Briefly, instead of structuring a database as a table in which each rowis a record and each column contains the fields in the record, as inearlier databases, the present invention permutes or otherwise modifiesthe columns to provide an advantage in, for example, space usage and/orspeed of access, such that the rows no longer necessarily correspond toindividual records. For example, one such modification is to condensethe column by eliminating redundant values (which reduces memory usage);another is sort-ordering the column, ensuring that value groups willalways appear in some particular order (which can greatly reduce thetime required to search a column for a particular value); still anotheris to both condense and sort a column. Other permutations andmodifications with other advantages are also possible. The table ofpermuted/modified values is referred to herein as the “value table.”

Logically, though not necessarily physically, separate data structuresprovide the information needed to reconstruct the “records” in thedatabase. In particular, they provide “instance” and “connectivity”information, where instance information identifies the instances of eachvalue in the field that is in a record and connectivity informationassociates each instance with a specific instance of a value in at leastone other field.

In one embodiment of the invention, both the instance and connectivityinformation is provided in a table, referred to herein as the “instancetable.” Each column in the instance table corresponds to an attribute ofthe records in the database and is associated with a column in the valuetable that contains the values for that attribute (and possibly otherattributes). Each cell (row/column location) in the instance table has aposition (in one embodiment of the invention, its row number) and aninstance value (the contents of the cell). An associated cell in theassociated column of the value table is derived from each instancecell's position. Also, an associated instance cell in another column ofthe instance table that belongs to the same record is derived from eachinstance cell's instance value. Thus, in this embodiment, an instancecell's position identifies the value which the cell is an instance ofand an instance cell's contents provides the connectivity informationassociating the instance with another instance cell in another field. Arecord can then be reconstructed starting at a cell in the instancetable by deriving, from the cell's position, the associated value cellin the value table and, from the cell's instance value, the position ofthe associated instance cell, and repeating this process at theassociated instance cell and so forth, with a last cell in the chainproviding, in one embodiment, the corresponding position of the startingcell.

If a column of the value table is sorted but not condensed, the valuetable column and the associated column in the instance table has, in oneembodiment of the invention, the same number of rows. An instance cell'sassociated value cell is, in this one embodiment, the value cell in theassociated value table column having the same row number as the instancecell. An instance cell's associated instance cell (i.e., cell in anothercolumn of the instance table belonging to the same record) is the cellin a specified column having the row number given by the instance cell'sinstance value. In one embodiment, the specified column is the nextcolumn in the instance table with the last column referring back to thefirst column. For example, if column 1 of the value table is uncondensedand, after permutation, column 1, row 2 and column 2, row 5 of the valuetable belong to the same record and an instance of column 2, row 5 is atcolumn 2, row 5 of the instance table, the instance table at column 1,row 2 would contain the number 5 (indicating that row 5 of the nextcolumn belongs to the same record).

If a value table column is condensed, there is in general no longer aone-to-one correspondence between that column and an instance tablecolumn that is associated with it. In this case, a table, referred toherein as a “displacement table,” is provided that, in one embodiment ofthe invention, has a column for each instance table column associatedwith a condensed value table column and specifies the range of instancetable row numbers associated with each row of the value table column.The value cell associated with an instance cell is then determined bythe corresponding displacement table column based on the instance cell'sposition (row number). In one embodiment, a displacement table columnhas the same number of rows as an associated value table column witheach cell in the displacement table providing the first row number inthe range of instance table row numbers associated with thecorresponding value cell. Alternatively, each cell in the displacementtable could, for example, provide the last row number in the range ofinstance table row numbers, the total number of rows in the range, orsome other value from which it is possible to derive the range ofinstance table row numbers associated with each value cell (i.e., theinstances of each value).

One drawback of the displacement table as just described, is thatsearching the displacement table for the value cell corresponding to aninstance cell slows record reconstruction. This drawback is addressed instill another embodiment of the invention in which the instance value ofan instance cell whose associated instance cell is in a column having adisplacement column is set to the position of the value cell associatedwith the associated instance cell (as opposed to the position of theassociated instance cell itself, as in the embodiment described above).The value of the associated instance cell is then directly obtainablewithout a search of the displacement table. In this embodiment, a table,referred to herein as an “occurrence table,” provides information fordetermining the associated instance cell.

In one embodiment of the occurrence table, each column in the instancetable that has cells with instance values as just described has anassociated column in the occurrence table that has the same number ofrows. A cell in the occurrence table is associated with a cell in theinstance table based, in this embodiment, on its position and specifiesan offset. The offset is added to the first row number in the range ofinstance table row numbers associated with the value cell to arrive atthe associated instance cell. The first row number is derived from thedisplacement table based on the instance value of the instance cell. Theconnectivity information for an instance cell is thus provided in thisembodiment by the instance cell's contents, the occurrence table and thedisplacement table.

The data structures described herein may be, but need not be, entirelyin RAM or distributed across a network comprised of a multiplicity ofdata processors. They may also be implemented in a variety of ways andthe invention herein is in no way limited to the examples given ofparticular implementations. For example one embodiment may involve onlypartly storing the data set using the computer-implemented database andmethods described herein, with the remainder stored using traditionaltable-based methods. Information may be stored in various formats andthe invention is not limited to any particular format. The contents ofparticular columns may be represented by functions or by functions incombination with other stored information or by stored information inany form, including bitmaps.

More generally, while the value, instance, displacement and occurrencetables have been described as “tables” having rows, columns and cells,the invention is not limited to such structures. Any computerized datastructure for storing the information in these tables may be used. Forexample, the value table described above is a specific example of a“value store” (i.e., it stores the data values representing theuser-view values of information in the database); the instance table isa specific example of an “instance store” and a “connectivity store”(i.e., it both identifies instances of data items in the value store andrepresents relationships among instances of data items in the valuestore); and the displacement table is a specific example of a“cardinality store” (i.e., it represents the frequency of occurrence ofequal instances of data values). The columns of a table are specificexamples of a “list” or, more generally, a “set.” A “set,” for thepurposes of the present invention, comprises one or more “elements,”each having a value or values and a “position,” where the positionspecifies the location of the element within the set. In the discussionabove, a “cell” in a column of a table is an example of an “element” andits position in the set is its row number.

Furthermore, although the embodiments described herein refer to andmanipulate traditional “records”, the invention is not limited torecords and is generally applicable to represent relationships betweendata values. All such variations are alternate embodiments of thisinvention.

Typical database operations supported by the database system of thepresent invention include, but are not limited to:

1) reconstructing physical records,

2) finding records matching query criteria,

3) joining tables in standard ways,

4) deleting and/or adding records,

5) modifying existing records, and

6) combinations of these and other standard database operations toperform useful tasks.

The present invention provides a new and efficient way of structuringdatabases enabling efficient query and update processing, reduceddatabase storage requirements, and simplified database organization andmaintenance. Rather than achieve orderedness through increasingredundancy (i.e., superimposing an ordered data representation on top ofthe original unordered representation of the same data), the presentinvention eliminates redundancy on a fundamental level. This reducesstorage requirements, in turn enabling more data to be concurrentlystored in RAM (enhancing application performance and reducing hardwarecosts) and speeds up transmission of databases across communicationnetworks making high-speed main-memory databases practical for a widespectrum of business and scientific applications. Fast query processingis possible without the overhead found in a fully inverted database(such as excessive memory usage). Furthermore, with the data structuresof the present invention, data is much more easily manipulated than intraditional databases, often requiring only that certain entries in theinstance table be changed, with no copying of data. Database operationsin general are thus more efficient using the present invention. Inaddition, certain operations such as histographic analysis datacompression, and multiple orderings, which are computationally intensivein record-oriented structures, are obtainable immediately from thestructures described herein. The invention also provides improvedprocessing in parallel computing environments.

The database system of the present invention can be used as a back-endfor an efficient database compatible with almost any database front-endemploying industry standard middleware (e.g., Microsoft's Open DatabaseConnectivity (ODBC) or Microsoft's Active-X Data Objects (ADO)) and willprovide almost drop-in compatibility with the large corpus of existingdatabase software. Alternatively, a native stand-alone engine can bedirectly implemented, via, for example, C++ functions, templates and/orclass libraries. Implemented either as a back-end to middleware or as astand-alone engine, this invention provides a database that looksfamiliar to the user, but which is managed internally in a novel andefficient manner.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram of one embodiment of the present invention.

FIG. 2 illustrates a simple ring topology.

FIG. 3 illustrates a topology having subrings with a bridge field.

FIG. 4 illustrates a “star” topology.

FIG. 5 is a flowchart illustrating a routine that finds the value tablecell associated with an instance table cell.

FIG. 6 illustrates a routine that determines the row of the next columnwhere the current column is V/O split.

FIG. 7 is a flowchart illustrating the mapping of a record's topologydata into a linear array.

FIG. 8 is a flowchart illustrating the process of writing a linearizedrecord topology into an instance table.

FIG. 9 is a flowchart illustrating the interchange of the cells of tworecords in the instance table.

FIG. 10 is a flowchart illustrating swapping a live for a deleted cell.

FIG. 11 is a flowchart illustrating finding the undeleted cell (if any)immediately adjoining the deleted cell(s) (if any) for a given value'sinstance cells in the instance table.

FIG. 12 is a flowchart illustrating moving a free (deleted instance)cell in the instance table from its original associated value to theimmediately preceding value.

FIG. 13 is a flowchart illustrating moving a free (deleted instance)cell in the instance table from its original associated value to theimmediately following value.

FIG. 14 is a flowchart illustrating determining the total number ofinstances (including deleted instances) for a given value.

FIG. 15 is a flowchart illustrating the deletion of a previously liveinstance cell in the instance table.

FIG. 16 is a flowchart illustrating the insertion of a new value into avalue table column, when the pointers into that column are not V/Osplit.

FIG. 17 is a flowchart illustrating the insertion of a new value into avalue table column, when the pointers into that column are V/O split.

FIG. 18 is a flowchart illustrating the assignment of a free (deleted)instance table cell to a given value in the value table.

FIG. 19 is a block diagram illustrating the steps in a delete recordoperation.

FIG. 20 is a block diagram illustrating the steps in an add recordoperation.

FIG. 21 is a block diagram illustrating the steps in a modify recordoperation.

FIG. 22 is a block diagram illustrating the steps in a query operation.

FIG. 23 is a block diagram illustrating the steps in a join operation.

DETAILED DESCRIPTION

FIG. 1 illustrates the basic hardware setup of an embodiment of thepresent invention. Program store 4 is a storage device, such as a harddisk, containing the software that performs the functions of thedatabase system of the present invention. This software includes, forexample, the routines for generating the data structures of theunderlying database and for reformatting legacy databases, such as thosein record-oriented files, into those data structures. In addition, thesoftware includes the routines for manipulating and accessing thedatabase, such as query, delete, add, modify and join routines. Datafiles are stored in storage device 2 and contain the data associatedwith one or more databases. Data files may be formatted as binary imagesof the data structures herein or as record-oriented files. Program store4 and storage device 2 may be different parts of a single storagedevice. The software in program store 4 is executed by processor 5,having random access memory (RAM) 7. The selection of the tasks to beperformed by the database system is determined by a user at user station6.

In the following discussion, the term “pointer” is used in a generalsense to include both the C/C++ language meaning (a variable containinga memory address) and, more generally, any data type which is used touniquely describe a location in storage, whether that storage be RAM,disk, etc. A pointer implemented as an integer offset from the beginningof a given data structure will perform the same function as a C/C++pointer while advantageously requiring less storage. The terms memoryand storage, used herein, mean any electronic, optical or other means ofstoring data.

The term multi-dimensional is used herein in a mathematical orquasi-mathematical sense to refer to a view of the data in which ann-column record-based table is considered to occupy an n-dimensionalvector space. It is not used in its narrower sense, sometimes used indata warehousing and On-Line Analytical Processing (OLAP), wheremulti-dimensionality refers to multiple layers of data analysis.

Basic Database Structure

Record-based tables, in which each row represents a record and eachcolumn is a field in the record, are commonly used in state of the artdatabases. A database in accordance with the present invention differsfrom this known structure. In one embodiment, the database is dividedinto two basic data structures; an uncondensed value table and aninstance table. The value table contains the same data instances asprior art databases, but each column may be permuted or otherwisechanged and thus a row no longer necessarily corresponds to a particularrecord. In accordance with this embodiment, the instance table providesthe means for reconstructing the records from the value table.Specifically, in one embodiment, the instance table has the same numberof rows and columns as the uncondensed value table and each cell (i.e.,row/column location) in the instance table contains the row number forthe next field in the same record (“next” being defined below). Thus,the value of the next field of the record containing Value_Table(r, c),where r and c are the row and column of a particular location in thevalue table, is Value_Table(Instance_Table(r, c), next(c)), whereInstance Table(r, c) is the row number of the next field. The functionnext(c) obtains the next column from the current one. In one embodimentof the present invention using a ring topology, next(c)=((c+1) mod n),where n is the number of columns and the columns are numbered from 0 ton−1 (zero-based indexing). In an alternate embodiment (columns numbered1 to n), next(c)=c mod n+1. A wide variety of topologies are possible,each having a corresponding next(c) function. For example, below is adatabase in the standard one-record-per-row format, with 1-based rownumbering:

PRIOR ART DATABASE: ENGLISH SPANISH GERMAN TYPE PARITY Record # (col. 0)(col. 1) (col. 3) (col. 4) (col. 5) 1 One Uno Eins Unit Odd 2 Two DosZwei Prime Even 3 Three Tres Drei Prime Odd 4 Four Cuatro Vier Power2Even 5 Five Cinco Fuenf Prime Odd 6 Six Ses Sechs Composi Even

The corresponding value and instance tables arranged in accordance witha specific embodiment of the present invention are:

VALUE TABLE: ENGLISH SPANISH GERMAN TYPE PARITY Row # (col. 0) (col. 1)(col. 2) (col. 3) (col. 4) 1 Five⁵ Cinco⁵ Drei³ Compos⁶ Even² 2 Four⁴Cuatro⁴ Eins¹ Power2⁴ Even⁴ 3 One¹ Dos² Fuenf⁵ Prime² Even⁶ 4 Six⁶ Ses⁶Sechs⁶ Prime³ Odd¹ 5 Three³ Tres³ Vier⁴ Prime⁵ Odd³ 6 Two² Uno¹ Zwei²Unit¹ Odd⁵

INSTANCE TABLE: ENGLISH SPANISH GERMAN TYPE PARITY Row # (col. 0)(col. 1) (col. 2) (col. 3) (col. 4) 1 1 3 4 3 6 2 2 5 6 2 2 3 6 6 5 1 44 4 4 1 5 3 5 5 1 2 6 5 6 3 2 3 4 1

The value table shown above is created by sorting each column, in thiscase, in alphabetical order. For explanatory purposes only a superscripthas been placed next to each value to indicate its record number in theoriginal database.

After sorting the columns, a row of the value table will not generallycorrespond to a single record in the original database. The instancetable however provides the information necessary to reconstruct thoserecords to the traditional external record view. Specifically, each cell(i.e., row/column location) in the instance table is associated, in theabove embodiment, with a single record. The cell with the samerow/column location in the value table contains the value of the recordfor the field associated with the column. The instance table cell itselfcontains the row number of the next field of the record.

For example, suppose the record containing row 1 of the “English” column(column 0) of the instance table is to be reconstructed. The associatedcell in the value table (i.e., row 1, column 0) contains the value“Five”. Taking the other fields (or columns) in order, first the row ofthe “Spanish” column (column 1) belonging to the same record as row 1 ofthe “English” column (column 0) is determined. The information isprovided by the instance table at row 1/column 0, which in this casecontains the number 1, meaning row “1” of the Spanish column is in thesame record as row “1” of the English column. Next, to determine the rowof the “German” column (column 2) from the same record as row “1” of theSpanish column (column 1), row 1/column 1 in the instance table is read,which contains the number 3, meaning row “3” of the “German” column isfrom the same record. Tracing this record through, row 3 of the “German”column (column 2) in the instance table provides the row of the “Type”column (column 3) in the same record and it contains the number5—meaning row “5” of the “Type” column is from the same record as row 3of the “German” column. Row 5 of the “Type” column (column 3) indicatesthat the corresponding row in the “Parity” column (column 4) is row 6.Finally, row 6 of the “Parity” column (column 4) in the instance table,which is the last column in the table, indicates that the correspondingrow from the “English” column, the first column in the table (column 0),is row 1, which is where the process started. This is due to the ringtopology used in this example.

Thus, in accordance with the embodiment illustrated above, eachrow/column location in the instance table contains the row number in thenext column which belongs to the same record, with the last columncontaining the row number of the same record in the first column. Thelinks between the row/column locations belonging to the same recordwould in this embodiment, form a ring through the instance table, asillustrated in FIG. 2. As this ring is traversed, directly correspondingrow/column locations in the value table allow recovery of each field'svalue. Topologies other than a ring may be used in alternate embodimentsof the present invention.

Generation of Value and Instance Tables from Record-Oriented Data

Value and instance tables can be generated in accordance with thepresent invention using the data in a prior art record-oriented databaseformat.

First, the value table is created by permuting or otherwise changing thedata in each column of the original database. Examples of changes in avalue table column are sort ordering the data and grouping like valuestogether. Different columns may be permuted or changed differently. Asort order should be chosen based on its usefulness for display orretrieval purposes in actual applications. The requirement for apotential sort order is that it have a computable predicate which ordersthe values. Some columns may remain unsorted. In the example above, allcolumns were sorted in alphabetic order.

In one embodiment, during this first step, a temporary intermediatetable is created that facilitates generation of the instance table inthe second step below. The intermediate table, in this embodiment, hasrows that correspond to records, as in the original prior art database,and columns that indicate the permuted position of the correspondingfield in the value table. The intermediate table for the above example(in which the permutations are sort orderings) is as follows:

INTERMEDIATE TABLE: ENGLISH SPANISH GERMAN TYPE PARITY Record # (col. 0)(col. 1) (col. 2) (col. 3) (col. 4) 1 3 6 2 6 4 2 6 3 6 3 1 3 5 5 1 4 54 2 2 5 2 2 5 1 1 3 5 6 6 4 4 4 1 3

Thus, for example, the intermediate table indicates that the “English”field for the original record 5 is in row I of the value table, the“Spanish” field for record 5 is in row I of the value table, the“German” field for record 5 is in row 3 of the value table, the “Type”field for record 5 is in row 5 of the value table and the “Parity” fieldfor record 5 is in row 6 of the value table.

In accordance with this embodiment, the instance table is thendetermined as follows:

Instance_Table(Intermediate_Table(r, c), c)=Intermediate_Table(r,next(c)),

for each row r and column c in the intermediate table and where next(c)is defined above. In other words, each cell in the intermediate tablespecifies a row of the corresponding column in the instance table; thatrow in the instance table receives the value in the next column of theintermediate table.

For example, referring to record number 5 in the example above, the“English” field (column 0) in the intermediate table contains the number1 and the next field. “Spanish.” (column 1) also contains the number 1.Based on this information, the value 1(i.e., the row number of the“Spanish” field of record 5 in the sorted value table) is placed in row1 of the instance table (i.e., the location of the “English” field ofrecord 5 in the sorted value table). Row 1 of the “Spanish” field in theinstance table is set to the value in the “German” field (column 2) ofrecord number 5 in the intermediate table (i.e., the value 3), whichcorresponds to the row number of the “German” field in the sorted valuetable. This process is repeated for each field, again with the lastfield wrapping around to the first field.

A person skilled in the art will recognize that there are equivalentalgorithms, some possibly avoiding the use of an intermediate table, forgenerating the instance table, and the present invention is not limitedto the algorithm shown here.

Condensed Value Table

In certain situations, the data in the value table may be moreefficiently represented in terms of space (e.g., memory or disk usage)if certain columns are “condensed” by eliminating redundant values. Forexample, in the value table above, the “Parity” field has only twodifferent values (“Even” and “Odd”) and the “Type” field has only fourdifferent values (“Compos”, “Power2”, “Prime” and “Unit”). (The numberof unique values for a given field is called its “cardinality.”)Accordingly, in a preferred embodiment of the present invention,redundancy can be eliminated by constructing a condensed value table,which for the example above is as follows:

CONDENSED VALUE TABLE: ENGLISH SPANISH GERMAN PARITY Row # (col. 0)(col. 1) (col. 2) TYPE (col. 3) (col. 4) 1 Five Cinco Drei Composi Even2 Four Cuatro Eins Power2 Odd 3 One Dos Fuenf Prime 4 Six Ses Sechs Unit5 Three Tres Vier 6 Two Uno Zwei

To realize this space savings the storage for the value table must beallocated in the appropriate manner; or example, allocating each columnas a separate vector or list, as opposed to allocating the table as atwo-dimensional array. In addition, the changes applied to the columnsshould group equal values together.

In order to retain the original information of the uncondensed valuetable, an additional structure, referred to herein as a “displacement”table, is provided in a preferred embodiment. In one embodiment of thepresent invention, the displacement table provides either the first orthe last row number at which each unique value in a column occurs in theoriginal uncondensed value table (referred to herein as “first rownumber” and “last row number” format, respectively). For example, thedisplacement table for the condensed value table above is as follows (in“first row number” format):

DISPLACEMENT TABLE: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 (nocondensation, so no displacement 1 1 2 table columns for ENGLISH, 2 4 3SPANISH, or GERMAN) 3 4 6 5 6

The “Parity” column of the displacement table thus indicates that thevalue in the first row of the condensed value table (i.e., “Even”) wasin row I of the uncondensed value table (that is, the value “Even” firstappeared in row 1) and the value in the second row of the condensedvalue table (i.e., “Odd”) first appeared in row 4 of the uncondensedvalue table. Alternatively, the record counts for each value may bestored in the displacement table with the first row for each value beingarithmetically derived, or some arithmetic combination of the count anddisplacement may be used.

A column having field width W bytes and cardinality C (i.e., C uniquevalues) is represented by a “condensed” column of unique values,together with a displacement table of integer values (row numbers) ofsize P bytes in W*C+P*C bytes of RAM, whereas storage of the uncondensedcolumn requires W*N bytes (where N is the number of records). Thus,where

W*C+P*C<W*N, or

C<N/(1+P/W),

this type of compression is beneficial.

The condensed columns in this embodiment generally destroys theone-to-one correspondence between the cells (i.e., row/column locations)of the instance table and the cells of the value table. Thus, duringrecord reconstruction, the value for a cell cannot be retrieved whentraversing the instance table simply by looking at the value in thevalue table at the same row/column location. For example, there is nolonger a cell in the value table at the same row/column location ascolumn 3 (“Type”), row 5 of the instance table in the example above.

Instead, in accordance with a preferred embodiment, the value of thefield associated with Instance_Table(r, c), where c is a condensedcolumn, is given by Value_Table(disp_row_num, c), where disp row_num isthe row number of the cell in the displacement table for the cth columnfor which

Displacement_Table(disp_row_num, c)<=r<

Displacement_Table(disp_row_num+1, c),

where the upper-bound test is not performed if disp_row_num+1 does notexist (i.e., if disp_row_num is the last Displacement_Table row forcolumn c)

(for “first row number” Displacement_Table format), or

Displacement_Table(disp_row_num 1, c)<r<=

Displacement_Table(disp_row_num, c)

where the lower-bound test is not performed if disp_row_num−1 does notexist.

(for “last row number” Displacement_Table format).

Again referring to the example above, to find the value in the valuetable associated with row 5 of column 3, the row in column 3 of thedisplacement table that has the largest value not greater than 5 islocated. That is row 3 in the displacement table above (row 4 having thevalue 6, which is greater than 5). Thus, row 3 of column 3 has the valuein the value table associated with row 5 of column 3 in the instancetable.

Space-Saving Techniques Applicable to Certain Types of Data

In accordance with a specific embodiment, fields with data havingcertain properties can be incorporated into the database system of thepresent invention without using some of the structures described above(i.e., value, displacement and/or instance tables). This is the casewhere the information that would be contained in these structures isalready present in the system in an implicit form; i.e., the informationis deducible from characteristics of the data or other information thatis present. For example if an uncondensed value table column containsthe numbers from 1 to N, there is no need to store this information inthe value table at all because the information is implicitly in theinstance table (row I of the instance table corresponds to value 1, andso forth). Each column (field) descriptor's information states whichstructures are implicit for that field and where and how to obtain theimplicit data. In the example just given, the column descriptor for theinstance table column would state that the value corresponding to eachrow is the row number. This data can then be used in the algorithmsdescribed herein, or other implementations of the algorithms.

When the special circumstances exist where “implicit” structures can beused, space savings can be achieved. Examples of such circumstancesinclude, but are not limited to, the following:

1) A field having unique values requires no displacement list since eachvalue in the field's value list appears only once in the instance list.

2) A field having contiguous, unique, integer values that have the samerange as the rows of the value table requires no value list and nodisplacement list. These values will sort so that their value would beequal to their position in a value list, which would also be theirposition in the instance list. Thus, their value is equal to theirposition (row) in the instance list, so no separate value list isneeded. Since these values are unique, no displacement list is neededeither.

3) A field having values that are the output values of a function ofcontiguous integer input values requires no value list if the functionproduces ordered outputs given ordered inputs (as would be the case, forexample, for a monotonic function). Values are computed by applying thefunction to the position (row) of a cell in the instance list. Since rowpositions are ordered contiguous integers, the output of the functionwill also be ordered. Thus no value list is needed since the values canbe computed from the instance list. Since the functions' output valuesare always unique for unique inputs, no displacement list is necessaryeither.

4) A field having values that are approximated by the output values of afunction of contiguous integer input values can be implemented with areduced-spaced value list if the function produces ordered outputs givenordered inputs (as would be the case, for example, for a monotonicfunction). The value list in this case need only contain the offset ofthe output of the function, instead of the full value, and is arrangedsuch that the offsets plus the outputs of the function produce orderedvalues.

5) A sequence of contiguous instance list elements all associated withthe same data value and all having associated (e.g., next) instance listelements associated with the same data value can be represented by asingle entry identifying, for example, the associated data value theposition of the associated instance element's associated data value andthe number of instance elements in the sequence, with the displacementlist adjusted appropriately.

Additionally, known compression and space-reduction techniques may beapplied to the value and instance tables (and other structures). Forexample, values may be represented using dictionary-type methods,including methods that match bit patterns that are less than the entirelength of a value. An effect of this compression and the compressiontechniques above is to produce more random bit patterns, which in turnimproves hashing performance. In addition, the value table, instancetable and other structures may be compressed, for example, using methodsthat take advantage of repeated bit patterns, such as run-lengthencoding, and word compaction (i.e., packing values into physical datastorage units when there is a mismatch between the value size and thephysical storage unit). The instance table can be further compressed,for example, by reordering the relative positions of the columns and theinstances within columns, where allowable, to optimize performance ofthe above compression techniques.

Alternative Instance Table with Condensed Value Table

The displacement table discussed above slows record reconstructionbecause values from condensed columns can be obtained only aftersearching the displacement table. An alternative configuration used inanother embodiment of the present invention is to modify the instancetable so that entries in a column pointing into a condensed column pointinstead directly into the value table. An additional table is thenprovided, referred to herein as the “occurrence” table, that containsinformation by which the row number of the next column in the instancetable can be calculated. The “occurrence” table contains the occurrencenumber of the particular value pointed to by the corresponding cell inthe instance table. Specifically, in an embodiment in which thedisplacement table is in “first row number” format, row numbers in theinstance table are 1 based, and the occurrence table is also 1 based,the instance table row number of the next field equals

Occurrence_Table(r, c)+

Displacement_Table(Instance_Table(r, c). next(c))−1

Variants on this embodiment include, but are not limited to, zero basedrow numbering in the various structures, and/or zero based occurrencenumbering in the occurrence table, and/or “last row number” format forthe displacement table entries. Such variants affect the formula abovefor determining the instance table row number of the next field. Forexample, for zero based row numbering in the occurrence and instancetables, zero-based occurrence numbering and “last row number” format inthe displacement table, the instance table row number of the next fieldis:

for Instance_Table(r, c)=0,

Occurrence_Table(r, c)

and for Instance_Table(r, c)>0:

Occurrence_Table(r, c)+

Displacement_Table(Instance_Table(r, c) 1, next(c))+1

(because Displacement_Table(Instance_Table(r, c) 1, next(c)) is the lastrow number for the previous value). In all such embodiments, theinstance and occurrence tables could be merged into one table havingtwo-part elements.

In the above example, the TYPE column points into the PARITY column ofthe instance table and the PARITY column in the value table iscondensed. In accordance with this alternative embodiment, the instanceand occurrence tables are as follows:

Alternative Instance Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 13 4 1 6 2 2 5 6 1 2 3 6 6 5 1 4 4 4 4 1 2 3 5 5 1 2 2 5 6 3 2 3 2 1

Occurrence Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 3 2 2 3 1 42 5 3 6 1

Thus, the TYPE column of the instance table now points directly at theassociated row in the value table of the PARITY column. For example, theTYPE column at row 5 of the instance table above contains 2, meaningthat row 2 of the PARITY column in the value table contains the PARITYvalue for the same record associated with row 5 of the TYPE column. Inthis case, row 2 of the PARITY column contains the value “Odd.” Theassociated row of the PARITY column in the instance table is the valuein row 5 of the TYPE column of the occurrence table plus the value inrow 2 of the PARITY column of the displacement table minus 1; which is3+4−1 or 6.

The value, instance, displacement and occurrence tables have beendescribed above as separately stored tables. However, in alternativeembodiments, this need not be the case. For example, the value anddisplacement tables elements can be stored adjacently, and the instanceand occurrence table elements can likewise be stored adjacently, inthose columns with condensed value tables. This may reduce storage cachemisses while retrieving data rows from the database, and also reduceoperand fetch time by allowing the adjacent elements to share the samebase storage address.

Skewering, or Nested Ordering

For a given value and displacement table, there are many possibleinstance and occurrence tables generating the same record set. This isbecause, for a value having multiple occurrences, the occurrences of thevalue may be assigned to the physical records having that value inarbitrary order. In general, the product of the factorials of thevarious values' multiplicities gives the number of instance/occurrencetables which generate the same physical record set. (There are thus414,720 (3!*1!*3!*1!*2!*1!*2!*1!*5!*2!*1!*3!*2!*1!) differentinstance/occurrence tables which generate the records of the SPJ_(mod)database given below.)

In a ring topology, there exists a unique instance/occurrencerepresentation that simultaneously stores N multikey “lexical” orderings(where N is the number of attributes in the database) with no moreoverhead than that required to store the individual sorted columns (acharacteristic referred to herein as “skewering”). Each column C definesone such ordering such that that column is taken as the most significantattribute in the key, with next(C) as the next most significantattribute, etc., to column prev(C) as the least significant attribute inthe key (where prev(c) is the previous column in the ring structure).The ordering is referred to as “lexical” herein because it is the sametype of ordering used to sort words alphabetically, i.e., the words aresorted on the first letter, then words with the same first letter aresorted on the second letter and so forth.

Skewering is illustrated below starting with a prior art table labelledSPJ_(mod) (excerpted from C. J. Date, Introduction to Database Systems,Sixth Edition, inside front cover (1995)):

SPJ_(mod): Rec # S# P# J# QTY 0000 S2 P3 J2 200 0001 S2 P3 J5 600 0002S2 P5 J2 100 0003 S3 P4 J2 500 0004 S5 P2 J2 200 0005 S5 P5 J5 500 0006S5 P6 J2 200

The condensed value and displacement tables, in accordance with theembodiments described above, for SPJ_(mod) are:

Value Table: Row # S# P# J# QTY 0000 S2 P2 J2 100 0001 S3 P3 J5 200 0002S5 P4 500 0003 P5 600 0004 P6 0005 0006

Displacement Table: Row # S# P# J# QTY 0000 0 0 0 0 0001 3 1 5 1 0002 43 4 0003 4 6 0004 6 0005 0006

Three alternative instance/occurrence tables are shown below, eachreproducing the physical record set of SPJ_(mod). The instance andoccurrence tables are shown as a single combined table with entries ofthe form instance/occurrence.

Each value in the value table corresponds to a contiguous block of cellsin the instance/occurrence table, which is defined by the displacementtable entries for that value. These blocks have been indicated byalternating highlights in the instance/occurrence tables printed below.

Instance/Occurrence (version 1): Row # S# P# J# QTY 0000 3/0 0/2 0/0 0/00001 1/0 0/1 1/0 0/1 0002 1/1 1/1 1/1 2/0 0003 2/0 0/4 1/2 2/1 0004 0/00/0 2/0 1/0 0005 4/0 1/0 2/1 2/2 0006 3/1 0/3 3/0 0/2

Instance/Occurrence (version 2): Row # S# P# J# QTY 0000 1/0 0/3 1/0 0/20001 1/1 0/0 0/0 0/0 0002 3/0 1/0 2/0 2/0 0003 2/0 0/2 1/1 2/2 0004 0/00/1 1/2 1/0 0005 3/1 1/1 3/0 2/1 0006 4/0 0/4 2/1 0/1

Instance/Occurrence (version 3): Row # S# P# J# QTY 0000 1/0 0/2 0/0 0/20001 1/1 0/1 1/0 0/0 0002 3/0 1/1 1/1 2/0 0003 2/0 0/4 1/2 2/2 0004 0/00/0 2/0 1/0 0005 3/1 1/0 2/1 2/1 0006 4/0 0/3 3/0 0/1

In version 3 the entries within each value block are in sorted orderbased on their instance and occurrence. The N (here, 4) multikeyorderings naturally defined by SPJ_(mod) are:

(S#,P#,J#,QTY),

(P#,J#,QTY,S#),

(J#,QTY,S#,P#), and

(QTY,S#,P#,J#),

where the fields are ordered from left to right in descending order ofsignificance.

In a prior art record-type table structured database., in order toreconstruct the records in any of these orders, there is a space-timetradeoff. If the records are to be reproduced quickly, and in lineartime, four separate indices are required, specifying the four differentsort orders. To avoid this redundant use of space, a time-consumingsearch is required each time the lexical order used is changed.

The skewered instance/occurrence table eliminates this tradeoff. Any ofthe natural lexical orders can be produced in linear time. For example,to reproduce the order (P#,J#,QTY,S#), the cells in column P# areprocessed top to bottom, reconstructing the record corresponding to eachsuch cell. These records will be in the desired lexical order. Toillustrate, the records corresponding to cells 0000 through 0006 ofcolumn P# are as follows:

cell 0000 of column P#->S5 P2 J2 200

cell 0001 of column P#->S2 P3 J2 200

cell 0002 of column P#->S2 P3 J5 600

cell 0003 of column P#->S3 P4 J2 500

cell 0004 of column P#->S2 P5 J2 100

cell 0005 of column P#->S5 P5 J5 500

cell 0006 of column P#->S5 P6 J2 200

Similarly, records may be reproduced in any of the N lexical orders byproceeding linearly down through the cells of the most significantcolumn of that chosen lexical order.

If the columns of the value table are sorted and condensed, as describedearlier, a skewered instance/occurrence table is formed by creating amulti-key lexical ordering starting at any column. The other N-1multi-key lexical orderings automatically result.

Preserving Standard Database Formats Within the Database System of thePresent Invention

The present invention allows the option of maintaining portions of adatabase in conventional form without incurring significant additionaloverhead. This may be desired, for example, for a column that cannot becompressed and will not be queried. An illustrative embodiment is shownbelow.

For purposes of this illustration, a French column, which will not betranslated into the data structures of the present invention, is addedto the original prior art database as shown below:

PRIOR ART DATABASE: Record # ENGLISH SPANISH GERMAN TYPE PARITY FRENCH 1One Uno Eins Unit Odd Un 2 Two Dos Zwei Prime Even Deux 3 Three TresDrei Prime Odd Trois 4 Four Cuatro Vier Power2 Even Quatre 5 Five CincoFuenf Prime Odd Cinq 6 Six Ses Sechs Composi Even Six

Instead of creating separate columns in the displacement, instance andoccurence tables for the FRENCH column, the FRENCH column is “attached”to one of the other columns, whose displacement, instance, andoccurrence tables were shown earlier.

First, a column is selected to which to “attach” the FRENCH column; anycolumn in the database may be selected for this purpose. In thisexample, the PARITY column has been selected. When reconstructing therecords in the database, the appropriate value for the “FRENCH”attribute is retrieved while determining the “PARITY” value for thatrecord.

In order to attach the FRENCH column to the PARITY column, prior to“value condensation”, the FRENCH cells in the value table are sorted inthe same order as the PARITY cells. As this operation is performedduring the construction of the data strutures in accordance with thepresent invention, a negligible amount of additional effort is required.The sorted FRENCH column is then appended to the condensed value table,as shown below:

Value Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY FRENCH 1 FiveCinco Drei Composi Even Deux 2 Four Cuatro Eins Power2 Odd Quatre 3 OneDos Fuenf Prime Six 4 Six Ses Sechs Unit Un 5 Three Tres Vier Trois 6Two Uno Zwei Cinq

The displacement, instance, and occurrence tables are, in oneembodiment, as follows:

Displacement Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 1 1 2 2 43 3 4 6 5 6

Instance Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 1 3 3 1 6 2 25 4 1 2 3 6 6 3 2 4 4 4 4 1 2 3 5 5 1 2 1 5 6 3 2 3 2 1

Occurrence Table: Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 1 3 2 1 2 32 2 4 1 3 5 1 1 6 3 1

Now, for example, the record corresponding to the query ENGLISH=“Three”is reconstructed. This record, in the prior art database, is given by:

Record # ENGLISH SPANISH GERMAN TYPE PARITY FRENCH 3 Three Tres DreiPrime Odd Trois

To reconstruct this record from the data structures above, first thevalue “Three” in the ENGLISH column of the value table is found, andthen the remaining attributes in the record are reconstructed by tracingthrough the instance table. At the instance cell for the Parity column,the “FRENCH” value corresponding to the record is the value in thecorresponding cell of the FRENCH column in the value table. In theexample, the entry in row 5 of the Parity column of the instance tableis associated with the record being reconstructed. Thus, the “French”value is found in row 5 of the “French” column of the value table, whosevalue is “Trois”.

Alternatively, an unsorted column may be included in the data structuresof the present invention by using the identity permutation as thepermutation for that column (i.e.. the value table for that column willnot be reordered in any way).

Column Merge Compression

In accordance with a further embodiment of the invention, separate valuetable columns can be merged into a single column, referred to herein asa “union column,” with separate displacement list columns for each ofthe original columns. This has the potential advantages of having asmaller value table, pre-joined data expediting join operations andimproved update speed. A value not present in a particular originalcolumn is indicated in the displacement table column by a null range forthat value. For example (assuming a “first row number” formatdisplacement table), if the original column did not have the value atrow ‘r’ of the merged column, the displacement table for that columnwould have the same value at row ‘r’ and row ‘r+1’ (that isDisplacement_Table(r+1,c)-Displacement_Table(r,c)=0). If ‘r’ is the lastrow in the column, its value is set to a number greater than the numberof rows in the instance table for that column.

Alternatively, if the displacement table is in “last row number” format,the null range indicating no instances of value number r is given byDisplacement_Table(r,c)=Displacement Table(r 1,c) (if r 1is a valid rownumber), or, for r equals the lowest valid row number,Displacement_Table(r,c)=0 (for 1-based row numbering) or −1 (forzero-based row numbering).

For example, the following prior art database is considered:

Prior Art Database: Record # FIRST MIDDLE LAST 1 John Frederick Jones 2Steven Allen Smith 3 Frederick Henry Blubwat 4 Albert Allen Brown 5Alexander Graham Bell 6 Alexander The Great 7 Harvey Nelson Tiffany 8Nelson Harvey Tiffany 9 Jackson Albert Poole 10 Henry Edward Billings 11Joseph Blubwat

The corresponding value and displacement tables for the FIRST and MIDDLEcolumns are, in one embodiment:

Value Table: Row # FIRST MIDDLE 1 Albert 2 Alexander Albert 3 FrederickAllen 4 Harvey Edward 5 Henry Frederick 6 Jackson Graham 7 John Harvey 8Joseph Henry 9 Nelson Nelson 10 Steven The

Displacement Table: FIRST MIDDLE 1 1 2 2 4 3 5 5 6 6 7 7 8 8 9 9 10 1011 11

Applying the column-merge space-saving technique results in a singlevalue table column for FIRST and MIDDLE, with the displacement tablecolumns for FIRST and MIDDLE adjusted to point into that column, asshown below:

Value Table with Union Column: FIRST union MIDDLE Albert Alexander AllenEdward Frederic Graham Harvey Henry Jackson John Joseph Nelson StevenThe

Displacement Table: FIRST MIDDLE 1 1 1 2 2 3 4 3 4 5 4 6 5 7 5 8 6 9 710 8 10 9 10 10 10 11 11 12 11

In this embodiment, the absence of a blank FIRST name is indicated bythe first two weeks of the displacement table having the same value(i.e., the difference is zero). The absence of FIRST name values “Allen”and “Edward” and MIDDLE name values “Alexander”, “Jackson, “John”,“Joseph”, and “Steven” are similarly indicated. In addition, a FIRSTname spelled “The” has no occurrences, as is indicated by a displacementtable value of 12, which is greater than the number of records.Conversely, a displacement table value in the last row that is less thanor equal to the total number of records indicates that value has anoccurrence (such as “The” in the last row of MIDDLE name in thisexample).

In this example, if 20 bytes of storage is required for each FIRST andMIDDLE field entry, uncondensed columns would use 440 bytes for 11records. After column merge compression, the union column of the valuetable uses a total of 20*15 bytes and with 2-byte values in thedisplacement table, the displacement table columns use 2*2*15 bytes, fora total of 360 bytes, a space savings of 80 bytes. The space savingswould be correspondingly greater where the value table values for theseparate columns have more overlap. Union columns can also beadvantageously used in implementing joins as described below.

Another space saving technique used in alternate embodiments of thepresent invention is to combine fields of low cardinality into a singlefield having values representing the various combinations of theoriginal fields. For example, in the example above, the TYPE and PARITYfields can be merged into a single field, TYPAR, having valuesrepresenting combinations of TYPE and PARITY values.

Modified Input Table: Record # ENGLISH SPANISH GERMAN TYPAR 1 One UnoEins Odd_Unit 2 Two Dos Zwei EvenPrime 3 Three Tres Drei Odd_Prime 4Four Cuatro Vier EvenPower2 5 Five Cinco Fuenf Odd_Prime 6 Six Ses SechsEvenComposite

Value Table: Row # ENGLISH SPANISH GERMAN TYPAR 1 Five Cinco DreiEvenComposite 2 Four Cuatro Eins EvenPower2 3 One Dos Fuenf EvenPrime 4Six Ses Sechs Odd_Prime 5 Three Tres Vier Odd_Unit 6 Two Uno Zwei

Displacement Table: TYPAR 1 2 3 4 6

Instance Table: Row # ENGLISH SPANISH GERMAN TYPAR 1 1 3 4 4 2 2 5 6 2 36 6 5 6 4 4 4 1 5 5 5 1 2 1 6 3 2 3 3

The process of setting up these data structures is exactly as before,except that the TYPE and PARITY data is taken as a unit, rather thanbeing two separate columns. While the compression of the TYPAR column isless than the compression achieved for the original TYPE and PARITYcolumns (due to the greater number of distinct values), an overallsavings of space results due to the reduced numbers of columns in thedisplacement and instance tables. This space savings is realizable ifthe combined cardinality is sufficiently low. Searching for valuesmatching the first part of the combined field (Even/Odd) is generallyunchanged, but searching for the second part(Composite/Power2/Prime/Unit) is more complicated. To search for, forexample, “Prime” it is necessary to search for both “EvenPrime” and“OddPrime”. In general, C such searches will be necessary, where C isthe cardinality of the first column of the combination. Counts are alsomore complicated for either column involved. More than two columns maybe combined, with similar costs.

Hashing

Hashing comprises a known high-speed data storage and retrievalmechanism that can significantly outperform logarithmic-time binarysearching. Although capable of delivering low-coefficient constant-timeperformance when implemented with an efficient hash function on anappropriate size hash table, the search for high-performance hashparameters can be complex, difficult and data dependent (e.g., dependingon both the number and distribution of values). Still more importantly,hashing has major drawbacks -especially as implemented by state of theart DBMS's. Hash functions typically fail to return ordered resultsrendering them unsuitable for range queries, user requests for orderedoutput, such as SQL “sort-by” and “group-by” queries, and other querieswhose efficient implementation is dependent on sortedness, such asjoins.

By supporting an efficient, ordered, reduced-space representation ofmulti-dimensional data, the present invention obviates the deficienciesof hashing associated with the unorderedness of prior art DBMS's.Moreover, any known hashing technique can be used in conjunction with,and as part of, the present invention.

One example of hashing applied to a sorted value table is a 64 KB hashtable in which each entry in the hash table contains the position of thefirst element in the value table whose first two bytes match the entry'sposition. For example, using 0-based numbering, the first entry in thehash table contains a pointer to the first entry in the value tablewhose first two bytes contain all 0's. The second entry contains apointer to the first entry in the value table whose leading two bytescontains 00000000 00000001, and so on. Every set of consecutive hashtable entries thus uniquely specifies the entire range of valuescontaining the associated leading two-byte bit-pattern for all 64kpossible leading two-bytes. This narrowed range of values can then besearched via for example a binary search, to find any sought value. Twoconsecutive hash table entries with the same value indicates that novalue elements contain the leading two bytes of the first entry.

Additional modifications can be imposed on top of hash tablesimplemented as described above. For example, space can be saved bystripping off the specified two bytes of the values in the value table,because those bytes can be obtained from the hash table. However,additional time is then needed to reconstruct the leading stripped offtwo bytes (if not known from the lookup), before that value can bereturned to the user. This may be done for example by binary searchingthe hash table for the appropriate row in the value table. This may takeup to 16 additional steps for a 64k hash table in the worst case, butaverage performance can be significantly reduced by for example aninterpolation search where this is supported by the regular distributionof a particular data set.

Hashing may also be performed on instance elements to directly return ornarrow the search for an associated value element, serving as analternative to the occurrence table. Any hash function that returns thevalue element associated with a given instance element or some near-byvalue element can be used for this purpose. If a near-by value elementis returned, the specific associated value element is then found bysearching a limited portion of the displacement table. One suchtechnique is a 64 KB hash table with pointers into the value tablemapped onto each possible leading two bytes of an instance table. Therange of displacement table entries to search are given by a hash tableentry and its adjacent entry.

In situations where significant searching is still required bututilizable localized distribution patterns also exists this 64KB entryhash table may be modified to accept two part entries. In such animplementation the first hash table entry still points to the firstvalue associated with the first instance element that contains the twoleading bytes specified by that position in the hash table. The secondhash table entry then provides the address of a function that utilizesthis local distribution to further narrow the search for the valueelement associated with that specified instance element.

The choice of 64KB hash tables corresponding to two byte fields is notmeant to be inclusive. Other byte size choices, other radixes, and byteplacements other than the leading bytes can also be utilized. Moreover,any other known hashing method may also be used.

A General Case Topology for the Instance Table

As described above, in a specific embodiment, individual records arelinked through the instance table in some topology, one of the simplestof which is a circularly linked list, or “ring” topology. So far theexamples have used this simple ring topology—the pointers (i.e.,entries) in the instance table link all the fields in a record in asingle loop with each field having a single “previous” and a single“next” field, as shown in FIG. 2. Other topologies may be used in otherembodiments of the invention.

A topology, as the term is used in connection with the presentinvention, is defined in terms of a graph in which attributes (or otherforms of associated data) are nodes in the graph and links exist betweennodes. Such is clearly the case for the simple ring topology.

Another example of a topology is shown in FIG. 3. In this topology, thefields are separated into two subsets having exactly one field in commonand each subset having a simple ring topology. The field common to bothrings acts as a “bridge” between them. Complete record reconstructionthen requires traversal around both rings, with the bridge field joiningthe record's subrings into a single entity. This topology isparticularly useful if the majority of queries only pertain to thefields in one of the subrings, since that subring can then be traversedand retrieved without traversing and retrieving the full record.

As shown in FIG. 4, still another topology is a star, or spur,configuration, wherein each field represents a doubly-linked spokeradiating from a central hub. Alternatively, the individual spurs(branches) of the star may be either a linear or ring topology. Ingeneral, any of the above topologies could be combined (or othertopologies used) to optimize record storage and retrieval for specificdatabases.

Any defined topology may be either singly or doubly linked and atopology need not be closed as in the examples above. Also, a topologymay be changed between singly and doubly linked at the user's option orautomatically by the database system based on access patterns. Adoubly-linked topology is useful when adjacency of data is important,that is, when the order of the fields in the record is arranged suchthat fields frequently accessed in combination are located topologicallyclose to each other. Singly-linked topologies are more desirable whenfull records (or substantial portions of them) are retrieved, or if apredominant field-retrieval starting point and order are given, sincethe instance list in a singly-linked topology occupies half the storageof the doubly-linked case.

Bridge Field Example

FIG. 3 specifically illustrates a topology wherein each record iscomprised of two separate subrings (ENGLISH>SPANISH->GERMAN andENGLISH->TYPE->PARITY) with ENGLISH as the bridge field. An instancetable that implements such a topology is shown below:

Row # ENGLISH SPANISH GERMA TYPE PARITY 1 1/5 3 5 3 6 2 2/2 5 3 2 2 36/6 6 1 1 4 4 4/1 4 4 5 3 5 5/4 1 2 6 5 6 3/3 2 6 4 1

The ENGLISH column has two outgoing pointers, one for each of thesubrings. To traverse, for example, the record starting in row 1 of theENGLISH column, one of the outgoing pointers is first followed, forexample, the one pointing to row 1 of the SPANISH column. Row 1 of theSPANISH column points to row 3 of the GERMAN column, which in turnpoints back to row 1 of the ENGLISH column. The other outgoing pointeris then followed, leading to row 5 of the TYPE column. Row 5 of the TYPEcolumn in turn points to row 6 of the PARITY column, which again pointsback to row 1 of the ENGLISH column.

Database Implementation

Described below are implementations of routines for inputting data to,maintaining and extracting data from the data structures describedabove. A person skilled in the art will recognize that there are manydifferent algorithms for performing these operations, and the presentinvention is not limited to the algorithms shown herein.

Primitive functions (i.e., functions that are called by other functions)are provided, in one embodiment, to extract the data associated with agiven record and buffer it in linear form, and to write such a bufferedlinear form of the data back into the data structures of the invention.

Data structures in accordance with the present invention are referred tobelow as follows:

1) VALS2: a value table with the columns in sort order and possiblycondensed;

2) DISP: displacement table (column 1 of which has same number of rowsas the corresponding column 1 of VALS2):

3) DELS: deletes table, described below, (column 1 of which has samenumber of rows as the corresponding column 1 of VALS2);

4) INST: instance table;

6) OCCUR: occurrence table.

In the discussion below, an embodiment having a ring topology is used,unless otherwise noted. In the ring topology in this embodiment, thefunctions prev(C) and next(C), which return the previous and nextcolumn, respectively, are as follows: prev(C)=(C+fcount−1) mod fcount,and next(C)=(C+1) mod fcount, where fcount is the number of columns andC is a column number ranging from 0 to fcount−1. An alternate embodimenthas column numbers ranging from 1 to fcount, withprev(C)=(C−1)?(C−1):fcount (using C language notation), and next(C)=Cmod fcount+1. More general topologies may be implemented by definingmore complicated prevo and nexto functions, and/or by analysis of thetopology into simple rings and repeated application of the functionsbelow on those simple rings.

The number of rows in the uncondensed value table, and the instancetable, is represented as reccount. Condensed VALS2 columns will havefewer rows, as will the corresponding DISP and DELS structures. Rows arenumbered from 0 to reccount-1 (zero-based); alternate embodiments mayhave rows numbered from 1 to reccount (one-based).

“V/O splitting” refers to the alternative instance table with condensedvalue table discussed above—the Ith column is called “V/O split” if thepointers in column 1 of the instance table have both a value and anoccurrence component. Parallel treatments for non-V/O and V/O splitcolumns are presented where appropriate. The descriptors for each columnin the instance table indicate whether the column has V/O splitting. Thedescriptors also contain other column/attribute specific information,such as the path of node traversal (i.e., “record topology”), whetherthe column has 0-based or 1-based numbering, etc.

Column descriptors for each column of the value table containconfiguration information, such as its data type, size of field, type ofpermutation/change (e.g. grouped by value, sorted), type of compression(if any), locking information, type of hashing (if any), and etc.

Other tables also have column descriptors containing relevantconfiguration information.

To facilitate the notation of function variables and return values, thefollowing will be used (written in a pseudo-C/C++ notation):

typedef long Row;

typedef int Column;

class ChainVO {Row chainV[fcount]; Row chainO[fcount]; bool valid;}

If a ChainVO object has been filled in with data for a valid record,chainV[C] contains the row number of the record's VALS2 entry in columnnext(C).

If column C of the instance table is V/O split, chainO[C] contains theoccurrence number of the value VALS2[chainV[C], next(C)].

If column C of the instance table is not V/O split, chainO[C] containsthe row number of the record's cell in column next(C) of the instancetable(s).

Record Reconstruction

Given a row number R in column C of the instance table, there exists aunique row number V in VALS2, containing the actual value associatedwith the [R,C] cell of the instance table. The routine shown in FIG. 5,Row get_valrec(Row R, Column C0, determines V from R and C. Step GV1determines whether column C has a DISP table column by checking thecolumn descriptors for column C, and if not, step GV2 sets V to R.Otherwise, if a DISP table column is present, step GV3 determines itsformat (again by checking the column descriptor). If the DISP tablecolumn is in first row number format, then V is set, in step GV4, to thevalue for which DISP [V,C]<=R<DISP[V+1,C] is true, where the upper-boundtest is not performed if V+1 does not exist (i.e., if V is the last DISPtable row for column c). Otherwise, V is set, in step GV5, to the valuefor which DISP[V−1]<R<=DISP[V,C], where the lower-bound test is notperformed if V−1 does not exist (i.e., if V is the first DISP table rowfor column c).

If column C of the instance table is V/O split, row number Next_R incolumn next(C) of the instance table is determined from the V/O entriesin a manner dependent on the DISP and OCCUR implementation as shown inthe flowchart of the R_from_VO(C, I, 0) routine in FIG. 6. R_from_VO( )is passed the current column, C, the row number, I, of the associatedvalue in the next column, next(C), of VALS2 (and DISP) and theoccurrence number, O, of that value. Step 242 sets variables I′ to I andX to zero. Step 243 tests (e.g., via column descriptors) whether columnnext(C) of the DISP table is in “first row” format or “last row” format.If it is “last row” format, step 244 is performed, which sets I′ to 1−1and X to 1. In either case, step 245 is then performed, which setsvariable O′ to O. Step 246 is then performed, which tests whether columnC of the OCCUR table is 1-based. If it is, step 247 is performed, whichsets O′ to O−1. In either case, step 248 is then performed, which setsNext R, the row in the next column, to DISP[I′,next(C)]+O′+X].

FIG. 7 illustrates a function for linearizing a record's topology data,referred to herein as get_chain(Row R0, Column C0). Starting at row R0,column C0 in the instance table, this routine walks through the pointercycle, storing the pointers in a ChainVO object. If the pointer cyclecloses, ChainVO.valid is set to “true”; otherwise it is set to “false”.

In step G1, instance table cell [R0, C0] is set as the starting pointfor record reconstruction. In step G2, the “current cell” [R, C] isinitialized to the starting cell [R0, C0]. In step G3, the columndescriptors for the current column in the instance table are checked tosee if it has V/O splitting.

If the column does not have V/O splitting, step G4 fetches the rownumber in the next column directly by setting R to INST[R, C]. Step G5sets the variable O or loading into the chainO[ ] array. Step G6 usesget_valrec(R, next(C)) to find the corresponding next-column row number,V, in VALS2, DISP, and DELS (if they exist).

If column C has V/O splitting, V (the row number in the next column,next(C), of VALS2, DISP, DELS) and O (the occurrence number for thatvalue) are set, at step G7, to INST[R, C] and OCCUR[R. C], respectively.Step G8 then uses R_from_VO(C, V, O) as described above to find the rownumber. R, in column next(C) of the instance table.

Processing then reconverges at step G9, where chainV[C] and chainO[C]are set to V and O, respectively. Step G10 then replaces C with next(C),and step G11 checks to see if processing has returned to the column atwhich it started (i.e., C0). If not, processing loops back to step G3,and repeats as above. If processing has reached the original startingcolumn, step G12 compares the current value of R to the starting valueR0. If equal, the pointer chain forms a closed loop, indicating that avalid record has been reconstructed and stored in the ChainVO object,and step G14 sets a flag to indicate this. If R is not equal to R0, stepG13 sets a flag to indicate the attempt to reconstruct a record did notresult in a closed loop, which in this embodiment of the invention(which uses a ring topology) indicates that a valid record does not passthrough cell [R0, C0]. In other embodiments of the invention, usingdifferent topologies, the pointer chain between associated instanceelements need not form a closed loop.

The final step of record reconstruction is the conversion of the valuetable row numbers stored in the chainV array to values. The column Cvalue of the record is given by VALS2[chainV[prev(C)], C] (possibly witha hash value prefixed, as described above).

Generalized Record Reconstruction

The description above for using get_chain( ) to reconstruct a record isbased on a simple loop topology in which the next column in the topologydepends only on the current column. The situation may be generalized.The next column may depend on meta-data, other than or in addition tothe current column. For example, the next column might be a function ofboth the current column and the previous column, i.e., C=next(C,prev(C)). In addition, the next column in the topology may depend ondata itself, such as the value, V, of the current cell in the valuetable, or depend on all of the above, i.e., C=next(C, prev(C), V).

Primitives for Record Modification

Primitive functions are now described for one implementation of recorddeletion record insertion, and record modification. The implementationis referred to herein as the “swap” method. In this method a value inthe value table may have deleted as well as nondeleted (“live”)instances. A data structure, referred to herein as DELS, stores a countfor each value of the number of deleted instances it has. Thus, DELS hasthe same number of columns as VALS2 and DISP, and, or any given column,the same number of rows in that column as VALS2 and DISP. The deletedinstances are regarded as free space in the instance table, and theinstance table is maintained such that for any given value in any givencolumn, all live instances are grouped contiguously together and alldeleted instances are grouped contiguously, such that the live instancesprecede the deleted instances or vice versa. This permits free space tobe easily located for assignment to new records or new field values forexisting records, as shown in the functions below. Free spaces can alsobe placed at desired locations in the instance table at setup time byincluding appropriate deleted records in an input data table; thusproviding one implementation for performing insertions prior todeletions.

The put_chain(Column C0, ChainVO rec, int count) function, shown in FIG.8, performs the inverse of get_chain( ). Put_chain( ), starting incolumn C0, writes part or all of the contents of a ChainVO object “rec”into the instance and occurrence tables, for “count” number of columns.The row number written to in column C is obtained from the prev(C)entries in rec. Put_chain( ) does not modify the value or displacementtables.

Step P1 sets the current column number C to the starting column C0. StepP2 checks the column descriptors for the instance table at columnprev(C) to determine whether the previous column is V/O split. Ifprev(C) is V/O split, step P3 sets V (the row number in the value anddisplacement tables) to chainV[prev(C)] and 0 (the row number in theoccurrence table) to chainO[prev(C)]. Step P4 sets R (the row number inthe instance table) to R_from VO(prev(C),V,O).

If the prev(C) column of the instance table is not V/O split, step P5sets R to chainO[prev(C)]. Having now obtained the row number in columnC of the instance table at which to write, step P6 determines if columnC is V/O split. If it is not, step P7 sets INST[R,C] to chainO[C]. Ifcolumn C is V/O split, step P8 sets INST[R,C] to chainV[C] andOCCUR[R,C] to chainO[C]. Processing then moves on to the next column instep P9 and the count of columns to process is decremented in step P10.If step P11 determines that no additional columns are to be written,processing is done, otherwise processing loops back to step P2 andrepeats.

FIG. 9 is a flowchart of the function int swap(Column C, Row R1, RowR2), which modifies the instance table, and other tables such that therecord passing through [R1.C] is made to pass through [R2,C], and therecord passing through [R2,C] is made to pass through [R1,C].

Step S1 fetches the record data for the record passing throughINSTI[R1,C] into ChainVO object ChainVO_1 and fetches the record datafor the record passing though INST[R2,C] into ChainVO_2. Both recordsmust be closed loops, otherwise, an exception is raised by get_chain( ).If both loops are valid, in step S2, the values inChainVO_1.chainV[prev(C)] and ChainVO_2.chainV[prev(C)] are interchangedand the values in ChainV0_.chainO[prev(C)] and ChainVO_2.chainO[prev(C)]interchanged. The exchanged values are in the prev(C) column, becausethat column determines the row number of column C in the instance table.The modifications are then written back into the instance table in stepS3 via the calls to put_chain(prev(C). ChainVO_1, 2) andput_chain(prev(C), ChainVO_2, 2). Put_chain( ) is called with count 2,since one pass through the put_chain( ) loop updates the pointers to theswapped cells, and the second pass updates the contents of the swappedcells. A success code is returned in step S4, and processing iscomplete.

FIG. 10 is a flowchart of the function del_swap(Column C, Row R, RowRd), which modifies the instance and other tables such that the recordthrough [R, C] is rerouted to pass through free cell [Rd, C]. Thisroutine is used, for example, in maintaining segregation of deleted fromlive instances of a given value.

In step D1, data for the record to be rerouted (at row R, column C) isplaced in the ChainVO object ChainVO_r via a call to get_chain( ). Ifget-chain( ) determines that the record is not a closed loop (i.e. it isan invalid record), an exception is raised. Step D2 finds the row, Vd,of the value table associated with the free cell Rd, usingVd=get_valrec(Rd, C).

Step D3 checks the column descriptors for column prev(C) of the instancetable to determine if it is V/O split. If it is not V/O split, step D4is performed, which sets ChainVO_r.chainV[prev(C)] to Vd andChainVO_r.chainO[prev(C)] to Rd. If column prev(C) is V/O split, stepsD5 and D6 are performed. Step D5 sets the occurrence number, Od, in amanner closely related to R_from_VO( ) described above; specifically,Od=Rd DISP[Vd′, C] X (where Vd′=Vd and X=0 if DISP is “first row number”format, or Vd′−Vd 1 and X=1 if DISP is “last row number” format: ifOCCUR is 1-based rather than zero-based, decrement X by 1 in thepreceding). Step D6 puts values into the ChainVO object, settingChainVO_r.chainV[prev(C)] to Vd and ChainVO_r.chainO[prev(C)] to Od. Ineither case, step D7 is then performed, which writes the modified recordtopology back into the instance table, via a call toput_chain(prev(C),ChainVO_r,2), and processing is complete.

FIG. 11 is a flowchart of the function top_undel(Column C, Row R). Cell[R,C] of the instance table has associated value VALS2[V, C], whereV=get_valrec(R,C). If this value has live instances, top undel( )returns the highest row number (in the instance table column C) for suchlive instances; otherwise the routine returns a flag indicating thereare no live instances and a number that is one less than the row numberof the first instance of V.

In step T1, the value table row V associated with instance table row Rof column C is found; i.e., V=get_valrec(R, C). Step T2 sets UP to thehighest row number in the instance table of all instances of value V. IfDISP is in “last row number” format, UP=DISP[V,C]; if DISP is “first rownumber” format, UP=DISP[V+1,C] 1, if DISP[V+1,C] exists, or, ifDISP[V+1,C] does not exist, UP=reccount+X 1 (where X=0 for 0 based rownumbering, and X=1 for 1 based row numbering). If there is no DISP atall for column C, a search through the uncondensed value table columnwill provide the row numbers of the first and last instances of valuenumber V. Step T3 sets DLS to the count of deleted instances for valuenumber V. If there is a DELS structure, DLS=DELS[V,C]; otherwise, DLS isobtained by counting the instances flagged as deleted. Step T4 sets TUto the row number immediately previous to the first deleted instance ofvalue number V (again, in the “swap method” embodiment all liveinstances of value number V precede, or in an alternate embodimentfollow, all deleted instances of that same value). Step T5 sets BOT, therow number of value number V's first instance. For “first row number”format DISP, BOT=DISP[V,C]; for “last row number” format DISP,BOT=1+DISP[V 1,C] (if DISP[V 1,C] exists), or BOT=X (for X based rownumbering, X=0 or X=1). If there is no DISP for column C, then BOT isobtained by a search in the uncondensed value table. Step T6 tests tosee if TU>=BOT. If not, then row number TU does not belong to valuenumber V, but instead is one less than the row number of the firstinstance of V. Step T7 follows, returning TU and a flag indicating thatall instances are deleted. If TU>=BOT, step T6 is followed by step T8,which returns the desired row number.

FIG. 12 is a flowchart of the function I_move_space_uplist(Column C, RowV), which swaps pointers in the instance table, so as to move a freecell from DELS[V+1, C] to DELS[V, C] while maintaining the segregationof live from deleted instances.

Step U1 checks whether there exists a value in row number V+1 in columnC of VALS2. If there is no such value in row number V+1 (i.e., index V+1is out of bounds) an error is reported at step U2 and function is done.Otherwise, step U3 tests whether value row V+1 of column C of VALS2 hasany deleted instances (one of which will be moved by this routine). Ifit has none, step U4 reports an error (no spaces to move), andprocessing terminates. If deleted instances are found, step U5 testswhether value row V+1 has any live instances by calculatingJ=top_undel(C, K) (where K, the row number of the first instance ofvalue number V+1, is given by K=DISP[V+1,C] if column C of DISP is“first row number” format, or K =DISP[V,C]+1 if column C of DISP is“last row number” format, or is found e.g. by linear search in thevalues table if there is no column C of DISP). If so, step U6 callsdel_swap(C, K, J+1) to swap the first live instance (at row K) with thefirst deleted instance (at row J+1), thereby putting the free cell nextto the instance table rows associated with the value V. Step U7 thendeducts this free cell from the count of deleted cells for value numberV+1 (by decrementing DELS[V+1,C] if DELS has a column C), step U8 addsthe cell to the count of deleted cells for value number V (byincrementing DELS[V,C], if DELS has a column C), and step U9 adjusts theDisplacement table (if it has a column C), to reflect the transfer ofthe free cell into value number V's set of deleted instances. If columnC of DISP is in “first row number” format, DISP[V+1,C] is incremented tomove the “floor” of the instance block for value number V+1 up one cell;if column C of DISP is “last row number” format, DISP[V,C] isincremented to move the “ceiling” of the instance block for value numberV up one cell.

FIG. 13 is a flowchart of I_move_space_downlist(Column C, Row V), whichswaps pointers in the instance table, so as to move a free cell fromDELS[V, C] to DELS[V+1, C]. Segregation of live from deleted instancesis maintained. An error is indicated by the return value.

Step W1 determines whether row V+1 of column C of VALS2 is out ofbounds. If it is, step W2 reports an error (i.e., that there is no V+1value to move a free cell to) and returns. Otherwise, step W3 testswhether value V has any deleted instances (one of which will be moved bythis routine). If not, step W4 reports an error (no spaces to move), andprocessing terminates. If deleted instances are found, step W5calculates J=top undel(C, K) (where K, the row number of the firstinstance of value number V+1, is given by K=DISP[V+1,C] if column C ofDISP is in “first row number” format, or K=DISP[V,C]+1 if column C ofDISP is in “last row number” format, or is found e.g. by a linear searchin the values table if there is no column C of DISP) to determinewhether value number V+1 has any live instances. If it does step W6calls del_swap(C, J, K−1) to swap the top live instance (at J) of valueV+1 with the last deleted instance of value V (at K−1), thereby movingthe free cell to the instance table rows associated with value table rownumber V+1. Step W7 then adds this free cell to the count for valuenumber V+1. Step W8 deducts the cell from the count for value number V.Step W9 shifts the boundary between V and V+1 to incorporate thetransferred free cell into value number V+1's set of deleted instances.If column C of DISP is in “first row number” format, DISP[V+1,C] isdecremented to move the “floor” of the instance block for value numberV+1 down one cell; if column C of DISP is “last row number” format,DISP[V,C] is decremented to move the “ceiling” of the instance block forvalue number V down one cell.

FIG. 14 is a flowchart of Row inst_count(Column C, Row V), which getsthe total instance count (live+deleted) for the value at VALS2[V, C],where V is a row number in the VALS2 array and C is the column number.

Step C1 determines if column C of DISP exists. If it does not, entriesof the required value in the uncondensed values column are counteddirectly in step C2, and processing is complete. If column C of DISPdoes exist, step C3 determines if the DISP column is in “first rownumber” format. If it is not, processing continues with step C4. If theDISP column is in “first row number” format, processing continues withstep C8. Step C4 determines how to set the variable BOT: if V is thelowest allowed index for DISP, then step C5 sets BOT=−1 for 0 based rownumbering, or sets BOT=0 for 1 based row numbering. If on the other handV 1 is a valid index for DISP, step C6 sets BOT=DISP[V 1,C]. Step C7then finds the total instance count as DISP[V,C] BOT, and processing iscomplete. If step C3 determines that DISP[c] is in “first row number”format, processing goes to step C8, which determines how to set thevariable TOP: if V is the highest allowed index for DISP, then step C9sets TOP=reccount+X for X based row numbering (X=0 or X=1). If on theother hand V+1 is a valid index for DISP, step C10 sets TOP=DISP[V+1,C].Step C11 then finds the total instance count as TOP DISP[V,C], andprocessing is complete.

FIG. 15 is a flowchart of a function that deletes the instance tablecell [R,C]; that is, cell [R,C] in the instance table is placed in thefree pool, and the appropriate DELS count is incremented. If the newlydeleted cell is surrounded by live cells for the same value, it is movedto the live/deleted boundary via del_swap( ). The record's topology datais in a ChainVO object called VO. This routine is used in recorddeletion and in updating a field in an existing record; see “Updateexisting record” step E6 and “Delete record through cell” step DR7,below.

In step DC1, the VALS2/DELS/DISP row number in column C is obtained fromthe ChainVO structure representing the record being processed, and thecorresponding DELS count is incremented. Step DC2 determines whether thepointers to the cell in question are V/O split. If prev(C) is not V/Osplit, step DC3 gets the row number R directly from VO.chainO. Ifprev(C) is V/O split, step DC4 reconstructs R from VO.chainO viaR=R_from_VO(prev(C),V,VO.chainO[prev(C)]). In either case, the topundeleted cell for the same value is located, via a call totop_undel(C,R), and set to T in step DC5 (which is guaranteed to exist,since the cell to be deleted is such an undeleted cell). Step DC6 thentests whether the cell being deleted is the topmost undeleted cell forthe value in question. If not, a swap is performed in step DC7 via acall to del_swap(C,T,R) in order to maintain segregation of live fromdeleted instances. Processing is then complete.

FIG. 16 is a flowchart of insert_v(Row V, Column C, void *newvalptr),which inserts into column C of VALS2, at row V, a new value notpreviously present (pointed to by newvalptr), and modifies theassociated DELS and DISP tables (if they exist) to accommodate the newvalue. This function is used only when column prev(C) of the instancetable is not V/O split.

Step IV1 tests whether the value at the insertion point (row V) has anylive instances; i.e., whether top_undel(R,C) returns a flag indicatingthat there are no live instances (R here is the INST row number of anyinstance of value number V; if column C of DISP exists, R=DISP[V,C] issuch an instance; if DISP has no column C, R=V is such an instance. Ifthere are no live instances of the value at row V, step IV2 isperformed, which overwrites the old value with the new value, andprocessing is complete. If there is a live instance, a search isperformed for the closest value having no live instances. Step IV3initializes an index J to 1. Step IV4 determines whether both V+J andV−J are out of bounds. If they are, meaning that no further searching ispossible and no free value slots have been found, step IV5 is performedwhich allocates additional space, for one or more additional slots, incolumn C of VALS2, DISP and DELS. Step IV6 tests whether V+J is inbounds and whether the value at row V+J has any live instances (viatop_undel(R′,C), where R′ is the INST row number of any instance ofvalue number V+J; if column C of DISP exists, R′=DISP[V+J,C] is such aninstance; if DISP has no column C, R′=V+J is such an instance. If V+J isin bounds and value V+J has no live instances, then the branch startingwith step IV7 is executed; otherwise, step IV10 is executed.

The branch starting with step IV7 is a loop which shifts the values inrows V to V+J−1 in VAL2 to the next higher row, thus opening an unusedrow at row V of VALS2. Step IV7 adds the deleted instances of valuenumber V+J to value number V+J−1, e.g., if DELS has a column C, thenDELS[V+J−1,C] is set to DELS[V+J−1,C]+DELS[V+J,C]. Step IV8 shifts thevalues at row V+J−1 of VALS2, (and DELS and DISP, if they exist) to rowV+J of VALS2, DELS and DISP, respectively (i.e.,VALS2[V+J,C]=VALS2[V+J−1,C]; DELS[V+J,C]=DELS[V+J−1,C];DISP[V+J,C]=DISP[V+J−1,C]), and then decrements J. Step IV9 iteratesstep IV8 until J=0. Then, step IV15 inserts the new value at row V ofVALS2 and updates DELS and DISP to indicate that the new value has noinstances; i.e., VALS2[V,C]=new value, DELS[V,C]=0, andDISP[V,C]=DISP[V+1,C]. Processing is then complete.

Step IV10 tests whether V−J is in bounds, and whether the value at rowV−J has live instances. If there are live instances, step IV11 isexecuted, which increments J and loops back to step IV4. If V−J is inbounds and value number V−J has no live instances, step IV12 isexecuted, which begins a loop which shifts the values in rows V−J+1 to Vin VALS2 to the next lower row, thus opening an unused row at row V ofVALS2. Step IV12 adds the deleted instances of value number V−J to valuenumber V−J−1 (e.g. if DELS has a column C, then DELS[V−J−1,C] is set toDELS[V−J−1,C]+DELS[V−J,C]). Step IV13 then shifts values for rows V−J+1of VALS2, (and DELS and DISP, if they exist) to rows V−J of VALS2, DELSand DISP, respectively (i.e. VALS2[V−J,C]=VALS2[V−J+1,C];DELS[V−J,C]=DELS[V−J+1,C]; DISP[V−J,C]=DISP[V−J+1,C]), and decrements J.Step IV14 iterates step IV13 until J=0, at which point step IV 15inserts the new value, with no instances, as described above, andprocessing is complete.

FIG. 17 is a flowchart of insert_vov(Row V, Column C, void *newvalptr),which inserts a new value (pointed to by newvalptr) into column C ofVALS2, when column prev(C) of the instance table is V/O split. The setof column descriptors for such a column includes a value h_val, thehighest currently used row number in VALS2 column C, and h_ptr, thehighest currently used row number in the instance table. Both VALS2 andthe instance table are preferably allocated with extra blank space attheir ends to accommodate new entries. In one embodiment of the presentinvention inserted new values are written at the end of the VALS2 table(rather than at their sort-order position as in insert_v( ), above). Apermutation list giving the added VALS2 row numbers in sort order isupdated by insertion of the new indexes at their proper sort positions.The permutation list is used to access the new part of the VALS2 columnin sort order (e.g., by a binary search algorithm). A search for a valuein the VALS2 column would first search the original, sorted part of thevalue list and, if no match was found, a second binary search would usethe permutation list to search among the new values.

Other means can be utilized to avoid otherwise unnecessary searching ofthe appended new value list. These include, but are not limited to, thefollowing: (1) The original sorted value list together with any instancetable values for V/O splitting may be re-organized in the background orovernight to keep the appended new value list as short as possible; (2)A bit flag embedded in the value list or associated displacement list orstanding alone identifies when new values have or have not been appendedbetween a given old value and a contiguous old value to avoidunnecessary searching of the appended new value list when no new valuesfall within that range; (3) A pointer mechanism, possibly associatedwith an existing hash function or with a hash function used expresslyfor this purpose narrows the range of the appended new value list thatneeds to be searched.

The insert_vov routine is used by the “Insert new record” and “Updateexisting record” routines.

Step VOV1 gets, from the column descriptors for column C, h_val, thehighest VALS2 slot number already used in column C, and h_ptr, thehighest instance table row number already used. In step VOV2, h_val andh_ptr are incremented to point to the first available empty slots incolumn C of VALS2 and the instance table, respectively, if such emptyslots exist. Step VOV25 determines if h_val is in range and, if it isnot in range, step VOV26 allocates additional space, for one or moreadditional slots, in column C of VALS2, DISP and DELS. Step VOV27 thendetermines if h ptr is in range and, if it is not, step VOV28 allocatesadditional space, for one or more additional slots, in column C of INST.

In step VOV3, the VALS2 slot is filled in with the new value (so thath_val once again indicates the highest used slot number); i.e.,VALS2[h_val,C] is set to new value. In step VOV4, the new value's propersort order position is found within the set of new appended values, andthe value h_val is inserted at that position in the permutation list. Instep VOV5, the DISP and DEL.S structures are updated; DISP[h_val,C] isset to h_ptr. and DEfLS[h_val,C] is set to 1, since incrementing h_ptrby one has in effect allocated space for one record, which is not yet a“live” record, and is thus part of the DELS pool. Finally, in step VOV6,row V, which is equal to the sort position of the new value) is changedto h_val for proper inclusion into the ChainVO object used in theroutine calling insert_vov( ) (specifically, chainV[prev(C)] must pointto the actual location of the new value, i.e. h_val).

FIG. 18 is a flowchart of insert_c(ChainVO VO, Column C). This routinechecks if there is a deleted instance of the column C value specified byVO and, if not, migrates a deleted instance from the nearest valuehaving one. The row number of the first deleted instance of the value isthen stored in VO.chainO (either as offset or row number, depending onV/O splitting).

Step IC1 sets V to the VALS2 row number in column C for the value whosefirst deleted instance is to be written into VO.chainO[prev(C)] (i.e.,V=VO.chainV(prev(C)). Step IC2 determines whether this value has adeleted instance (i.e., whether DELS[V,C]>0, or by direct count oftombstoned, flagged, cells in those embodiments in which deleted cellsare marked by a flag). If this value has no deleted instances, step IC3is performed, which does an iterative search for the closest value witha deleted instance (similar to the search done in “insert_v( )” above)to determine whether there is a deleted instance for any value. If nodeleted instance is found, step IC4 allocates additional space, for oneor more additional slots, in column C of INST, and adjusts the datastructures as needed to indicate that the additional slots are free(i.e., deleted instances). Step IC5 then does an iterative applicationof I_move space uplist( ) or I_move_space_downlist( ) until the value atrow V of VALS2 has a deleted instance. Step IC6, which is also donedirectly after step IC2 if value table row number V already has adeleted instance of its own, sets J to a row number in column C of theinstance table that is associated with value table row number V. Ifcolumn C of DISP exists, J DISP[V,C] is such an instance. If DISP has nocolumn C, J=V is such an instance. Step IC7 then finds the first deletedinstance, K, which is at one plus the row number returned bytop_undel(C,J) (i.e., K=top_undel(C,J)+1). Step IC8 then tests whetherthe previous column, prev(C), is V/O split. If it is not, step IC9 setsVO.chainO[(prev(C)] to K. Otherwise, step IC10 sets VO.chainO[(prev(C))]to K=DISP[V′,C]-X, where V′=V and X=0 if DISP is “first row number”format, or V′=V 1 and X=1 if DISP is “last row number” format. In eithercase, processing is then complete.

Record Deletion

Record deletion is performed in this embodiment by identifying thepointer table cells of the record and marking them as deleted in theDELS column when it exists, or by a tombstoning flag. Such free space isthen available for use when the data in existing records is changed ornew records are added. In unsorted columns that are “attached” asdescribed above, the delete status of a cell in the attached column isidentical to that of the cell to which it is attached.

Deletion of a record is illustrated in FIG. 19. In step DR1, the recordto be deleted (the one containing cell [R0, C0]) is loaded into ChainVOobject VO. If the pointer chain starting from [R0, C0] does not form aclosed loop, an exception is raised, terminating processing. Step DR2determines whether the record has any deleted cells (by, e.g., repeatedtesting of whether top_undel( ) returns a row number less than thecell's row number), because only a live record can be deleted. If therecord contains a deleted cell, step DR3 reports an error and processingis complete. If no cells in the record are deleted, step DR4 isperformed, which initializes the current column, C, to column C0 and thecurrent column, R, to R0. Step DR5 then deletes the current cell, [R,C], and sets R to the row number in next(C), via a call to “Deletepointer(s) cell [R,C],” described above. Step DR6 then sets C to thenext column via a call to next(C). If the current column is not equal tothe starting column (C not equal to C0), step DR7 loops back to stepDR5; otherwise, step DR7 terminates processing.

For example, if record “Five” is deleted in the example above, all cellsin INST (and possibly OCCUR) belonging to this record will be marked as“deleted” (by tombstones for columns not having DISP/DELS, and by theDELS column where it exists). For the above example the DELS table willlook as follows (if record “Five” is the only one that has beendeleted):

DELETES Row # ENGLISH SPANISH GERMAN TYPE PARITY 1 0 0 2 0 1 3 1 4 0 5 6

A count of records with, for example. TYPE=“Prime” is now obtained fromcolumn TYPE of DISP as 6−3 (difference of row 4 entry and row 3 entry),indicating that there are three such records; however, the DELETESstructure indicates that one of those records is deleted, hence the truetotal is 6−3−1=2. The number of records with PARITY=“Odd” is obtainedsimilarly. DISP shows the value 4 in row 2 of the PARITY column. Hence,rows 4 through 6 (last row) of INST are associated with “Odd”, threerecords in all. Again, there is a “1” in the PARITY column of DELETFS,row 2, so the number of undeleted records with PARITY=“Odd” is 3−1=2.

Record Insertion

Insertion of a new record is illustrated in FIG. 20. Step IR1 obtainsthe values for the new record's fields (for example, from a user) andstores them in a temporary buffer. A ChainVO object VO is also allocatedfor record construction and insertion. Fields belonging to an “attached”column, as described above, are treated essentially as suffixes to thevalues in the column to which they are attached. Step IR2 sets thecurrent column C to the first column. Step IR3 then searches the valuetable, VALS2, for the value, V, specified as the new record's column Cvalue, and returns the sort position, V, of the value and whether thevalue already exists. Note that when prev(C) is V/O split, the search inVALS2 is done in two parts; first, the original, sorted value list issearched and then, if no match is found, the appended listed of addedvalues is searched through the permutation list (as described above).Step IR4 tests whether the value already exists. If it does, step IR8 isexecuted; otherwise step IR5 is executed.

Step IR5 determines whether column C is V/O split (in which case thecolumn descriptors for prev(C) of the instance table would indicate V/Osplitting). If it is V/O split, step IR7 is performed, which inserts thenew value into VALS2, DISP, and DELS via a call to insert_vov(V,C,*newvalue). If it is not V/O split, step IR6 is performed, which inserts thenew value int VALS2, DISP, and DELS via a call to insert_v(V,C,*newvalue).

In either case, step IR8 then builds chainV in the VO object, settingVO.chainV[prev(C)]=V, where V is either the sort position found in stepIR3 (if column prev(C) is not V/O split), or h_val found in “insert_vov()” (if column prev(C) of the instance table is V/O split).

Step IR9 determines whether the value at row V of VALS2 has deletedinstances (by checking the DELS table column if it exists, otherwise bycounting tombstones). If it does not, step IR10 is performed, whichprovides a deleted instance for the value via a call to insert_c(VO,C)(updating VO.chainO[prev(C)] in the process).

If there is already a deleted instance, the branch starting with stepIR11 is performed. In step IR11, the row number, K, in the instancetable of the first deleted instance is found via a call totop_undel(C,J)+, where J represents the row number, in column C of theinstance table, of an instance of value number V. In particular, ifcolumn C of DISP exists, then J=DISP[V,C] is such an instance. In theabsence of column C of DISP, J=V is such an instance. Step IR12 thendetermines whether column prev(C) of the instance table is V/O split. Ifprev(C) is V/O split, then step IR13 sets VO.chainO[prev(C)] to theappropriate occurrence number, i.e., K−DISP[V′,C]−X, where V′=V and X=0if DISP is in “first row number” format, or V′=V 1 and X=if DISP is“last row number” format). If prev(C) is not V/O split, step IR14 loadsthe row number of the deleted instance, K, into VO.chainO[prev(C)].

In either case, step IR15 then deducts the deleted instance from thedeletes count for row V of VALS2. Step IR16 then moves on to the nextcolumn, setting C to next(C). Step IR17 then determines whether allcolumns have been processed (i.e., whether C equals column 0) and, ifnot, loops back to step IR3. Otherwise, step IR18 writes the wholeobject VO back into the instance table (via a call to put_chain(0, VO,fcount), where fcount is the number of fields in the record), andprocessing is complete.

Record Updates

Updating an existing record is illustrated in FIG. 21. In step El, theuser chooses a record to be updated and, in step E2, the record isloaded into a ChainVO object VO and in a temporary buffer. Any cell ofthe record may be selected as a starting point for loading VO. In stepE3, the user optionally changes any or all field values (unlessread-only conditions pertain) in the temporary buffer. Step E4initializes the “current column,” C, to start at the first column,column 0. Step E5 determines whether the user changed the value infield/column C. If the user has not changed the value, no processing isneeded in this column, and step E20 is performed, which advances C tothe next column, by setting C to next(C).

If the column C value has changed, step E6 is performed, which deletesthe formerly live instance of the record's old value, via a call to“Delete pointer(s) cell [R,C]”, described above. Step E7 then searchesthe value table (using, e.g., a binarv search) for the value specifiedas the new record's column C value, and returns the sort position of thevalue., whether matched or not. When prev(C) is V/O split, the search ofthe value table is done in two parts, first, the original, sorted valuelist is searched and then, if no match is found the appended listed ofadded values is searched through the permutation list (as describedabove). Step E8 tests whether the new value was already in VALS2. If itwas, the branch starting with step E12 is performed; otherwise thebranch starting with step E9 is performed.

Step E9 determines whether the column prev(C) is V/O split. If it is V/Osplit, step E11 inserts the new value into VALS2, DISP, and DELS via acall to insert_vov(V,C,*new value); otherwise, step E10 inserts the newvalue via a call to insert_v(V,C,*new value). Step E12 builds the chainVof the new record, one element at each pass, settingVO.chainV[prev(C)]=V, where V is either the sort position found in stepE7, if column prev(C) is not V/O split, or h_val found in “insert_vovo,”if column prev(C) is V/O split. Step E13 determines whether the value atrow V has deleted instances by looking in the DELS table, or countingtombstones if DELS has no corresponding column. If the value has nodeleted instances, the branch starting at step E14 is executed. Step E14provides a deleted instance for the value, via a call to insert_c(VO,C),and updates VO.chainO[prev(C)] in the process. Otherwise, the branchstarting at step E15 is executed. Step E15 finds the row number, K, incolumn C of the first deleted instance of value number V; i.e.,K=top_undel(C,J)+1, where J represents the row number, in column C ofthe instance table, of an instance of value number V. In particular, ifcolumn C of DISP exists, then J=DISP[V,C] is such an instance;otherwise, J=V is such an instance. Step E16 tests whether columnprev(C) is V/O split. If it is V/O split, then step E18 is performed,which sets K to the occurrence number; i.e., K=K−DISP[V′,C]-X, whereV′=V and X=0 if DISP is “first row number” format, or V′=V1 and X=1 ifDISP is “last row number” format. In either case, step E17 is thenperformed, which loads the proper data into VO.chainO; i.e.,VO.chainO[prev(C)]=K. Step E19 then removes the deleted instance fromthe deletes count. Step E20 then changes C to the next column(C=next(C)). Step E21 tests whether all columns have been processed;i.e., whether C=0, which was the starting column. If C has not returnedto the starting column, execution loops back to step E5 and repeats withthe new C value. Otherwise, step E22 writes the whole object VO backinto the instance table, via a call to put_chain(O, VO, fcount), andprocessing is done.

Queries

Because columns in the database system of the present invention may beindependently sorted, queries can be performed very quickly. Any of avariety of efficient, standard search or lookup algorithms can be used.For example, a simple binary search delivers a worst case timeperformance of C log2 n, and an average performance of C log2 (n/2).Other search techniques can also be used., with the best one beingdependent on the specific situation and characteristics of the data.

Parallelization can be implemented on top of either a binary midpoint orinterpolation search. Such techniques for parallelization of searchalgorithms are known in the art. Further parallelization can be obtainedby grouping rows of sorted data elements from each column in size ncontainers, where n equals either the number of processors or anintegral multiple thereof. The system tracks the upper and lowerboundary points of these containers, removing the necessity of databeing sorted within them. Where n equals the number of processors,entire containers can then be searched and manipulated with the sameefficiency that single rows are operated upon in single processorenvironments, while displacements within these containers becomeinconsequential.

As an example, a flowchart for a query for all records having a chosenvalue for a given field is illustrated in FIG. 22. In step 221, thevalue table for a particular field is searched for the values matchingthe chosen value, M. Again, because the columns are generally in sortedorder, a binary search can be used (as well as other search techniques).Step 222 tests whether a matching value was found. If a matching valueis not found, that is reported in step 223.

If a matching value is found, for example at Value_Table(r, c), steps224 and 225 are performed, which determine the row in the value tablewith matching values (step 224) and reconstruct the records associatedthose rows (step 225). For a non-condensed column, the record associatedwith the cell with the matching value is reconstructed as discussedabove; then contiguous rows (r+1, r+2, . . . , r−1, r−2, . . . ) arechecked for matching values, and if additional matching values are foundthe records associated with those cells are also reconstructed. Thesearch of contiguous rows can stop in any direction when a non-matchingvalue is found.

For a condensed column, the range of instance table row numbers thatpoint to the matching value is obtained from the displacement table.Again, where the matching value was found at Value_Table(r, c), thecontents of Displacement_Table(r. c), if in “first row number” format,is the beginning of the range and Displacement_Table(r+1, c)−1 is theend of the range (unless r is the last row in the displacement table, inwhich case the end of the range is the last row in the instance tablefor the column). Step 225 then reconstructs, as described above, therecords containing the cells identified in the instance table.

More complicated queries, such as (FIELD_X=M) .AND. (FIELD_Y=N),(FIELD_X=M) .OR. (FIELD_Y=N), and so on, are also efficientlyimplemented using the data structures described herein. For example, anAND query can be implemented by finding (as above) all records matchingFIELD_X=M, then testing for the second condition (e.g., FIELD_Y=N)during record reconstruction.

A significant advantage of the present invention is that the ANDcondition query can be performed with fewer steps because, for condensedcolumns, the number of rows meeting each of the conditions is alreadyknown from the displacement table. The first condition to be applied canthen be chosen to be the one with fewer matches. In contrast, existingdatabase engines typically must perform “analysis” cycles periodicallyin order to have only an approximate idea of the cardinality found ineach column. With the embodiments of the present invention describedabove, the cardinalities are known ahead of time for each value. An ORquery can be implemented, for example, by finding all records matchingthe first condition and then finding all records matching the secondcondition that were not already matched by the first condition. If anarbitrarily complex expression is known in advance to be a frequentquery, a sorted column for that expression can be included in the value,displacement and instance tables just as though it were an ordinaryfield, and the same rapid binary search method would apply.

Data structures, corresponding to those discussed above, can beinitialized with the results of a query, thus facilitating sub-queries.

SQL Functions

Many SQL functions may be supported by the data structures in accordancewith the present invention with a trivial amount of computationaleffort. For example, the COUNT function, which returns the number ofrecords having a specified value for a given attribute, is available inconstant time by accessing the entries for that value and the adjacentvalue in the displacement table. The MAX and MIN functions, which findthe records with the maximum and minimum values for a given attribute,can be implemented by accessing the top and bottom cells, respectively,in the given column. The MEDIAN function, which finds the record withthe middle value for a given attribute, can be implemented by searchingfor the location of the displacement table closest to half the recordcount, and returning the associated value. The MODE function, whichfinds the value with the largest number of occurrences, can beimplemented by a linear search for the largest difference in adjacentdisplacement table values, and using the corresponding value. Thesefunctions (called aggregation functions) are efficient because thedisplacement table is directly related to the histogram of value countswithin the column.

INSERT, DELETE, and UPDATE operations are supported as shown, forexample, in the embodiments of these operations described above.

The present invention also supports other types of SQL queries. Forexample, suppose there are two tables, labeled “PLANT” and “EMPLOYEE”,whose various attributes are shown below:

PLANT: PLANT_NAME PLANT_NUMBER MANAGER_ID etc...

EMPLOYEE: EMPLOYEE_NAME EMPLOYEE_ID JOB ADDRESS etc...

A query, for example, to find the name of each manager of each plant isexpressed in SQL as follows:

SELECT EMPLOYEE_NAME FROM PLANT, EMPLOYEE WHERE MANAGER_ID = EMPLOYEE_ID

If the representations for the two tables are uncoupled, i.e., they eachhave separate value, instance, displacement, and occurrence tables,simple nested loops can be used to test for equality between values inthe MANAGER_ID column of the PLANT database and the EMPLOYEE_ID columnof the EMPLOYEE database, and, for each match, the correspondingEMPLOYEE_NAME in the EMPLOYEE database can be found.

If the instance, displacement, and occurrence tables of the EMPLOYEE andPLANT databases point to the same value table with a singleMANAGER_ID/EMPLOYEE_ID column, then, for each displacement table thathas an entry for a particular column for both EMPLOYEE and PLANT, thecorresponding EMPLOYEE_NAME in the EMPLOYEE table can be found.

Joins

A join operation combines two or more tables to create a single joinedtable. For example, two tables may each have information about EMPLOYEESand a join might be performed to determine all information in bothtables about each EMPLOYEE.

In order to perform a join, tables are typically linked through aprimary or candidate key in one of the tables. The primary or candidatekey is an attribute or attribute combination that is unique. A redundantrepresentation of this same attribute or attribute combination, called aforeign key, is contained in one or more other tables. The foreign keysneed not have the same cardinality as the primary or candidate key andneed not be unique.

A join operation is defined as a subset of an extended Cartesian productof two or more tables. A Cartesian product of two record-based tablescombines each row of the first table with every row of the second table.For example, if the first table had M rows and N columns and the secondtable had P rows and Q columns, the Cartesian production would have M×Prows and N+Q columns. An extended Cartesian product is a Cartesianproduct that results from inserting null values into one or more of theoriginal tables.

A membership function defines the subset of the extended Cartesianproduct of two or more tables that are in the join answer set (i.e., theoutput of the join operation). The membership function contains acomparison condition and a join criterion that jointly determine aparticular join type, which together with column selectors determine theanswer set returned by the join.

The comparison condition specifies a logical operator. It is, forexample, what appears between the attribute names in the “Where” clauseof an SQL SELECT statement. The most common comparison condition isequality and the corresponding join is referred to as an equi-join.Other conditions such as greater than or less than are also possible.

The join criterion specifies the answer set of a join, given acomparison condition, specific join attributes and column selectors. Forconvenience equi-joins on a single attribute in each table are assumedin the discussion below. Join criteria include inner join (the joinanswer set consists of those rows that appear in both tables), outerjoin (further subdivided into left outer join, right outer join and fullouter join—the join answer set consists of all the rows in the left,right or either table together with the corresponding rows of the othertable where they exist, null filled otherwise), union join (the joinanswer set consists of those rows that appear in only one of the twotables, with the remaining values in those rows null filled), and crossjoin (the join answer set consists of the full non-extended Cartesianproduct of the two tables).

The column selectors specify which columns are returned in the answerset of the join.

In prior art database systems, joins tend to be extremely costly instorage space and/or processing time, requiring either pre-indexed datato maintain sortedness or a time intensive search involving multiplepasses over the entirety of each attribute that is being joined. In thelatter case, the time to do a two column join is proportional to thesquare of the number of rows, a three-column join proportional to thecube, etc., for tables of equal cardinality and equal to the n-foldproduct of record counts otherwise.

The present invention largely eliminates the overhead associated withjoins. All attributes can be sorted, and union columns can eliminate theneed to maintain redundant copies of data. Membership functions can beimplemented efficiently through the displacement table, variousalternate displacement tables, bit maps, and/or n-valued logicfunctions.

Alternate Displacement Tables

Certain properties of the union column lead to various modifications tothe displacement table columns, which are particularly useful inperforming joins. The “full” displacement structure has, for eachcolumn, rows that are in one-to-one correspondence with the rows of thecorresponding column of the (condensed) value table. The contents of acell of the full displacement table, in one embodiment, is the rownumber of the first (or last, depending on the embodiment) instance inthe instance table of all instances possessing the corresponding valuein the value table. If a value in the value table has no instances atall, identical entries in the displacement table in the correspondingand next (alternatively, previous) cells will indicate this.Consequently, if there are many more values without than with instances(referred to hereafter as the “sparse” case), there are many morerepeated than different values in the displacement structure, leading toredundancy in the displacement table. In the full displacement table, inone embodiment, the entries are in sorted order, so that for row numberJ in the instance table, the corresponding row number V in the valuetable is that for which DISP[V,1]<=J<DISP[V+1,1] (for a displacementcolumn in the “first row number” format), or (for last row numberformat) DISP[V−1,1]<J<=DISP[V,1].

In a “sparse” case, an alternative format for displacement tablecolumn(s) (referred to below as the “condensed” displacement format) canbe used to remove redundancy. In this format, displacement table entrieshave two parts:

1) DV, the row number in the value table of a value having instances,and

2) DD, the starting (alternately, ending) row number in the instancetable of the actual instances of the value.

The row number entries DD are in sorted order; DV will naturally also bein sort order when the underlying value table is in sort order.

For row number J in column 1 of the instance table, the correspondingrow number V in the value table is found as follows:

1) find K, via, e.g., a binary search, such that DD[K]<=J <DD[K+1] (for“first row number” format) or (for “last row number format”)DD[K−1]<J<=DD[K];

2) V=DV[K].

A condensed displacement column, when appropriate, simultaneously savesstorage space and speeds up binary searching. However, testing for thepresence of instances of a given value is a constant-time lookup using afull displacement column, but a log time binary search using a condenseddisplacement column.

In the case where values without instances are rare, a further alternateformat of the displacement table (referred to herein as “dense” format)permits all missing values to be found quickly. In this alternateformat, displacement table entries have a bitflag to identify valueswith no instances, and, for those values with no instances, the contentsof the entry is a pointer to the next value without instances. (Theoriginally defined displacement list, lacking the linked list of missingvalues, is referred to below as “full” format).

Examples of Alternate Displacement Tables

Sparse and dense displacement columns are illustrated below for priorart, record-type, tables J_(mod) and SPJ_(mod) (excerpted from C. J.Date, Introduction to Database Systems. Sixth Edition, inside frontcover (1995)):

J_(mod): Rec # J# JNAME CITY 0000: J1 Sorter Paris 0001: J3 OCR Athens0002: J4 Console Athens 0003: J5 RAID London 0004: J6 EDS Oslo

SPJ_(mod): Rec # S# P# J# QTY 0000: S2 P3 J2 200 0001: S2 P3 J5 6000002: S2 P5 J2 100 0003: S3 P4 J2 500 0004: S5 P2 J2 200 0005: S5 P5 J5500 0006: S5 P6 J2 200

Value, displacement, instance and occurrence tables for J_(mod) andSPJ_(mod) are as follows:

J_(mod): VALS: Row # J# JNAME CITY 0000 J1 Console Athens 0001 J3 EDSLondon 0002 J4 OCR Oslo 0003 J5 RAID Paris 0004 J6 Sorter

DISP: Row # J# JNAME CITY 0000 0 0 0 0001 1 1 2 0002 2 2 3 0003 3 3 40004 4 4

Combined Instance/Occurrence Table: Row# J# JNAME CITY 0000 4/0 0/1 1/00001 2/0 2/0 2/0 0002 0/0 0/0 3/0 0003 3/0 1/0 4/0 0004 1/0 3/0 0/0

SPJ_(mod): VALS: Row # S# P# J# QTY 0000 S2 P2 J2 100 0001 S3 P3 J5 2000002 S5 P4 500 0003 P5 600 0004 P6 0005 0006

DISP: Row # S# P# J# QTY 0000 0 0 0 0 0001 3 1 5 1 0002 4 3 4 0003 4 60004 6 0005 0006

Instance/Occurrence Table: Row # S# P# J# QTY 0000 1/0 0/2 0/0 0/2 00011/1 0/1 1/0 0/0 0002 3/0 1/1 1/1 2/0 0003 2/0 0/4 1/2 2/2 0004 0/0 0/02/0 1/0 0005 3/1 1/0 2/1 2/1 0006 4/0 0/3 3/0 0/1

To facilitate rapid join queries on, for example, over the J# attributeof tables J_(mod) and SPJ_(mod), a union column for J# is created andsparse and dense displacement table columns corresponding to the unioncolumn are incorporated into the displacement tables for J_(mod) andSPJ_(mod). The J# union column for J_(mod) and SPJ_(mod) is as follows:

J# Union for J_(mod) and SPJ_(mod): Row # J# 0000 J1 0001 J2 0002 J30003 J4 0004 J5 0005 J6 0006 J7

The appropriate type of displacement column for each of J_(mod) andSPJ_(mod) is determined by comparing the cardinality of the union columnto the cardinalities of the corresponding columns of the J_(mod), andSPJ_(mod) tables. The cardinality of the J# union column above is 7. Thecardinality for the J# column in the J_(mod) table is 5. Since nearlyall values in the union column also appear in the J_(mod) table, a densedisplacement column is constructed for that attribute. For the SPJ_(mod)table, the cardinality of its J# column, 2, is compared to thecardinality of the union column, 7. Since the J# values are “sparse” inthis case, a sparse displacement column for the SPJ_(mod) column isconstructed. The J# union column, the displacement column for J_(mod)and the displacement column for SPJ_(mod) are shown below, all in onetable for illustration purposes:

Union and Displacement Columns: J_(mod) SPJ_(mod) Row # J# UnionD-column D-column 0000 J1 0 1/0 0001 J2 *6  4/5 0002 J3 1 0003 J4 2 0004J5 3 0005 J6 4 0006 J7 *1 

In the dense displacement column for J_(mod), the asterisks arebitflags, indicating (1) that J_(mod) does not have a record with thecorresponding value, and (2) that the value which follows is a pointerto the next value in the union column which does not appear in J_(mod).Those values in the union column which do not appear in J_(mod) are thusmaintained in a circular linked list.

In the sparse displacement column for SPJ_(mod), the entries arepresented in the format DV/DD, where DV is a pointer to a value in theunion column which has instances in the SPJ_(mod) table and the DDpointer is the starting row number in the SPJ_(mod) instance/occurrencetable of the instances of the given value.

Modelling Joins Using Bit Maps

The J# union column for the J_(mod) and SPJ_(mod) tables may also besupplemented by bit maps. The bit map will indicate whether a givenvalue in the union column is contained in the J_(mod) or SPJ_(mod)tables. A procedure for creating such a structure is illustrated below.The bit map in this example consists of seven entries, 0000 through0006, one for each value of J# present in the union column. Each entryis associated with 2 bits. The first bit is set to 1 if thecorresponding value of J# is present in the J_(mod) table, 0 otherwise.Likewise, the second entry is set to 1 if the J# value is present in theSPJ_(mod) table, and 0 otherwise.

Since the J_(mod) table is represented by a dense displacement column,its bit entries are initialized to ‘1’ (since almost all the values inthe union column are contained in J_(mod)). Likewise, since SPJ_(mod) isrepresented by a sparse displacement column, its bit entries areinitialized to ‘0’ (since few of the values in the union column arepresent in SPJ_(mod)). The initial bit map is thus as follows:

Initial Bit Map: Row # J_(mod)/SPJ_(mod) 0000 1/0 0001 1/0 0002 1/0 00031/0 0004 1/0 0005 1/0 0006 1/0

The next step is to construct the final bit map. For the J_(mod) column,the values not present in the J# union column are contained in the ringof non-present values in its dense displacement column. The ring istraversed and the corresponding entries in the bit map are set to ‘0’.

To correct the entries for the SPJ_(mod) column, the DV pointers pointto the values in the union column which have entries in the SPJ_(mod)tables and the corresponding entries in the bit map are set to ‘1’. Thefinal bit map is as follows:

Final Bit Map: Row # J_(mod)/SPJ_(mod) 0000 1/0 0001 0/1 0002 1/0 00031/0 0004 1/1 0005 1/0 0006 0/0

N-valued logic functions can model join operations with functions overbit maps. This technique is illustrated by the example below withreference to prior art tables S. P. and J (from C. J. Date, Introductionto Database Systems, Sixth Edition, inside front cover (1995)):

S: S# SNAME STATUS CITY S1 Smith 20 London S2 Jones 10 Paris S3 Blake 30Paris S4 Clark 20 London S5 Adams 30 Athens

P: P# PNAME COLOR WEIGHT CITY P1 Nut Red 12 London P2 Bolt Green 17Paris P3 Screw Blue 17 Rome P4 Screw Red 14 London P5 Cam Blue 12 ParisP6 Cog Red 19 London

J: J# JNAME CITY J1 Sorter Paris J2 Display Rome J3 OCR Athens J4Console Athens J5 RAID London J6 EDS Oslo J7 Tape London

In this example a union join is performed on the “CITY” columns of theS, P, and J tables. This entails finding only those records whose “CITY”value appears in exactly one of the S, P, or J tables.

The first step is to construct a union column for the CITY columns of S,P, and J, if one does not already exist.

The second step is to associate with each value of the union columnthree bits, corresponding to the S, P, and J tables, respectively. A bitis set to ‘Y’ (i.e., ‘1’) if the CITY value is present in theappropriate table, and to ‘N’ (i.e., ‘0’) otherwise. Such a table isdepicted below:

Union column and bit map: CITY S P J Athens Y N Y London Y Y Y Oslo N NY Paris Y Y Y Rome N Y Y

For a particular value of the CITY attribute, records with that valueappear in the union join if and only if that value of CITY appears inexactly one of the S, P, and J tables, i.e., exactly one of the bits inthe bitmap for the union column equals ‘Y’. One illustrativeimplementation of a function that finds such rows is function f(tempcolumn) described below. The function's domain consists of the twovariables ‘temp’ and ‘colum’. The variable ‘temp’ can be one of threevalues; ‘Y’, ‘N’, or ‘D’. The variable ‘column’ is either ‘Y’ or ‘N’.Lastly, the return value of function f also consists of the three values‘Y’, ‘N’, or ‘D’.

For each value of CITY in the union column, function f is appliediteratively to the bit values in each of the three columns: the variable‘column’ is set to the bit value of the current column, and ‘temp’ isassigned the result of the previous application of the function f. Forthe first column, S, ‘temp’ is initialized to ‘N’. After the finaliteration, if the result is ‘Y’, the value appears in the union join; ifthe result is ‘N’ or ‘D’, the value does not appear.

The function f is defined as follows: Temp Column Return Value N N N Y NY D N D N Y Y Y Y D D Y D

Applying this function to the first row in the Union Table,corresponding to the value ‘Athens’, yields the following result:f(f(f(‘N’,‘Y’),‘N’),‘Y’), which equals ‘D’. Hence ‘Athens’, whichappears twice in the row, does not appear in the Union Join.

Applying function f to the row for ‘Oslo’ yields the following result:f(f(f(‘N’,‘N’),‘N’),‘Y’), which equals ‘Y’. Flence ‘Oslo’, which appearsexactly once in the row, does appear in the union join.

FIG. 23 is a flowchart that illustrates a join operation. In step 231,the user picks tables to join. In step 232, any tables not alreadyrepresented in the data structures of the present invention areconverted into such structures. Next, in step 233, columns, if any, arepicked whose values are part of the logical expression defining the joinas a subset of the extended Cartesian product. Step 234 tests if anycolumns were selected. If no columns were selected, the join correspondsto the full non-extended Cartesian product, and record reconstructionproceeds via step 238 without conditional constraints (i.e., everyrecord from each table is combined with every record of every othertable).

Otherwise step 235 is performed which tests if more than one column wasselected. If so, those columns are combined into a combined column (suchas in the “combined columns” description above).

If the appropriate value table union column does not already exist, step237 creates it, together with its associated displacement table columns.Step 238 then modifies the ranges in the routines that produce the joinoutput, using full, dense and/or sparse displacement lists, bitmaps,multivalued logic functions or any combination of them, so as to matchthe type of join, using the appropriate comparison condition and joincriterion.

For example, the answer set of an inner join is limited to instancetable cells corresponding to displacement table rows in which the tablesinvolved have non-null record ranges. This can be determined, forexample, from their displacement table entries. Corresponding instancecell entries derived from each such displacement table row (and possiblyone of the adjacent rows, depending on the implementation) provide theinstance table cell ranges for each table for all matching records. Theanswer set is restricted to only those records, producing theappropriate inner join answer set. The answer sets for other types ofjoins can be similarly determined from, for example, the displacementtable.

Combination with the query methods discussed above enablesimplementation of a full range of statements like SQL's “SELECT . . .WHERE . . . . ”

While the invention has been particularly shown and described withreference to particular illustrative embodiments thereof, it will beunderstood by those skilled in the art that various changes in form anddetails are within the scope of the invention, which is defined by theclaims.

1. A system for storing and retrieving tuples comprising: a collectionof a number of instances corresponding to a value of a first attribute;a cardinality element corresponding to the number of instances; whereinthe cardinality element is updated each time the number of instanceschanges and wherein at least one instance indicates at least one otherinstance corresponding to a value of a second attribute and the secondattribute is different from the first attribute.
 2. A system for storingand retrieving tuples comprising: a collection of a number of instancescorresponding to a value of a first attribute; a cardinality elementcorresponding to the number of instances; wherein the value can bederived from the cardinality element and wherein at least one instanceindicates at least one other instance corresponding to a value of asecond attribute and the second attribute is different from the firstattribute.
 3. A system for storing a plurality of tuples, each tuplecomprising at least a first attribute having a first attribute value anda second attribute having a second attribute value, the systemcomprising: for at least two tuples having identical first attributevalues and identical second attribute values, a single instance elementthat identifies the first attribute value and the second attributevalue, and a cardinality element comprising information regarding thenumber of tuples having the identical first and second attribute values.4. The system of claim 3 wherein the instance element comprises thecardinality element.
 5. A system for storing a plurality of tuples, eachtuple comprising at least a first attribute having a first attributevalue and a second attribute having a second attribute value, the systemcomprising: a. a value store storing the values of the first and secondattributes of the plurality tuples; b. an instance store identifyinginstances of the values in the value store associated with each tuple;c. a connectivity store storing information regarding relationshipsamong instances; and d. a cardinality store storing informationrepresenting frequencies of occurrence of instances of equal value,wherein a particular value in the value store associated with aparticular instance in the instance store is derived using thecardinality store.
 6. The system of claim 5 wherein the instance storeand the connectivity store are not distinct.